

PKELiaiNABY STUDY FOE A 

PHASF^I^^ if SlilUlATIOTJ FACILITY. 
EASE 1, EXTENSION ^Control Data Corp. , St, 

aul, Mirni.) 434 p HC A19/HF A01 CSCL 01A 


G3/02 


N78- 19052* 


Unclas 

08630 


PREUIVIINARY STUDY 
FOR A 

NUMERICAL AERODYNAMIC SIMULATION FACILITY 


SUMMARY REPORT - PHASE 1 EXTENSION 


By: N. R. Lincoln 


FEBRUARY, 1978 


Distribution of this report is provided in the interest of information 
exchange. Responsibility for the contents resides in the authors or 
organization that prepared it. 


Prepared under Contract No. NAS2-9457 by: 


CONTROL DATA CORPORATION 
Research and Advanced Design Laboratory 
4290 Fernwood Street 
St. Paul, Minnesota 55112 


for 


AMES RESEARCH CENTER 

NATIONAL AERONAUTICS AND SPACE ADMINISTRATION 




StfMMARY REPORT - PHASE 1 EXTENSION 


Phase I of the NASF study which was completed in October 1977 produced several conclusions about the 
feasibility of construction of a flow model simulation facility. A computer structure was proposed for the 
Navier-Stokes Solver (NSS), now called the Flow Model Processor (FMP), along with technological and 
system approaches. Before such a system can enter an intensive design investigation phase several tasks 
must be accomplished to establish uniformity and control over the remaining design steps, as well as clarifying 
and amplifying certain portions of the conclusions drawn in Phase 1, 

In order of priority these were seen as: 

1. Establishing a structure and format for documenting the design and implementation 
of the F.W facility. 

2. Developing a complete, practically engineered design that would perform as claimed 
in the Phase 1 report. 

3. Creating a design verification tool for NASA analysts, using a computerized simulation 
system. 

4. Identifying key elements of the flow model three-dimensional codes to be used as 
metrics for verifying the progress of FMP design against a set of predefined system 
objectives. 

5. Developing a programming language specification for a proposed FMP language to be 
tested by Ames and RADL personnel. 

6. Coding of the key elements in the experimental language. 

7. Hand compilation of the encoded program segment. 

8. Submission of the hand-compiled elements to the FMP simulator for timing and data 
flow analysis. 

9. Documentation and “packaging” of the resulting simulator and input code to permit 
NASA personnel to continue experiments and analysis. 

10. Development of sufficiently detailed functional descriptions as to permit NASA personnel 
to develop their own simulations where necessary. 

11. Refinement of Reliability Analysis to include realistic estimates of component counts. 

These tasks were attacked with all of the Phase 1 personnel plus three additional designers and mathematicians. 
The major expenditure of time was in the revision and re-revision of the computer design and the production 
of a CPU Instruction Specification and Fractional Specification which are included as appendices to the 
Final Report, and which were fundamental to all but the first task fisted. An outline of the form and 
desired content of the basic specifications for software and hardware were produced as guidance and structure 
skeletons for future contract phases. The 3-D code analysis and language specification were not completed 
in this phase, as the language direction was changed several tunes in this admittedly preliminary study phase 
of the project. 
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PROJECT STATUS AND CONCLUSIONS 


Most of the tasks undertaken in this extension phase were continuations of woiic initiated in Phase 1 of 
the NASF study. All of the tasks ate expected to continue in some form as the full NASF architecture 
is defined in detail, and finally designed and implemented in subsequent phases of this project. The “best 
effort” engaged by Control Data Corporation has resulted in the following task status: ’ 

1. Hardware description 

a. Performance metrics — The three-dimensional implicit code was analyzed to determine 
if extrapolations from the two-dimensional code done in Phase 1 were still valid. It 
was found that the Ames guidance regarding the extrapolations were sound, and 
conclusions about the computational load invoked by the 3-D code still hold from the 
Phase 1 report. 

No time was spent on the explicit code beyond determining the FMP instruction 
requirements for data dependent processing that differs in part from the implicit code 
behavior. The result was a decision to retain the APL vector operations of vector 
search and compare and the corresponding operations -.of compress, mask and merge. 

Bit string operations on long string were cheated as an unnecessary complication 
to the h^dware and supplanted by a scalar (non vector) form of handling bit string 
operations. The performance degradation due to this change was estimated at less 
than 1 percent for the entire code execution. 

Pertinent segments of the implicit code were identified for further study. The sequence 
of subroutines or subprocesses used in forming the tridiagonal matrices from sweeps in 
the three mesh directions, and the tridiagonal solver constitute the obvious key computa- 
tional burden of the implicit code, while the memory accessing patterns for both 
Memory and Backing Store can be derived from the examination of the AMATRX, 
FILTRX, FILTRY, FILTRZ sequences. The conclusion is ftat for a “first-cut” validation 
of the FMP hardware architecture, a minimum of these subprocesses and the BTRI/ 
LUDEC (Block tiidiagonal solver and LU decomposer for the solver) should be pro- 
grammed, hand-compiled and simulated on the block-level simulator. In addition, the 
mathematical behavior of these sequences should be analyzed to determine if, in all 
practical cases, the arithmetic can be done in 32-bit mode to improve the throughput 
characteristics of this code in the 64/32-bit pipelines of the proposed FMP. 

A hand compilation of a segment of the beginning of the J sweep for, the left- 
hand-side solution was accomplished. The limited code generated was constrained 
by the time remaining after decisions were finalized on the FORTRAN extensions 
to be proposed. This re^on of the code is critical to the performance of the 
FMP since it exercises the memory system in the most inefficient manner possible 
in the implicit code. Data cannot be streamed directly into vector arithmetic 
operations, but first must be GATHERED firom discontiguous columns of data in 
the original mesh. With the minimal resources remaining, it was decided that this 
sequence was the most interesting to hand-compile and block-model with the GPSS 
simulator. 

The sequence of instructions was submitted to the Version 1 FMP model, with the 
result ftat a floating-point rate of 933 megaflops was achieved under these worst-case 
conditions. The implication (which will be explored more thorou^ly in the next 
study phase) is that, in fact, this small segment is truly the worst worst^jase and 
thus the average computation rate will be well in excess of the one-gigaflop thresh- 
hold sought by NASA. 
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At the time of this writing, the hand compiling and block simulation have not been 
completed, but should be available at the time of submission of the final report. 

b. Functional design — Design of the originally proposed Control Data FMP was revised and 
a preliminary set of functional and instruction specifications was produced for the FMP. 
The major design changes involved two main thrusts: 

• Reduction of complexity of the Map Unit and Vector Unit by 
constraining the generality originally proposed for those units. 

• Improvements in reliability by increasing the amount of Single Error 
Correction Double Error Detection encoding throughout the Vector 
Units and associated bufiiers, the addition of more checking logic, and 
an additional pipeline for “instantaneous” swapping into the vector 
arithmetic ensemble as failures are detected. 

The technological risks of using a new circuit family in the FMP which were strongly 
emphasized in the Phase 1 report were reexamined and the decision was made to 
proceed on two concurrent paths in the development of the FMP. These were to 
pursue the new technological developments aggressively while at the same time, assessing 
the configurations and reliabilities of the FSfi* as if it were to be built out of 
existing ECL LSI as used in the STAR-100 program. It was decided to take an 
approach to the newer generation LSI that would not invalidate all of the 
architectural or block-level design, so that those tasks could proceed somewhat 
independently of the technology development. This decision thus permits Control 
Data to postpone the recommendation for logic family imtil much later in the 
design <^cle, and thus delay consideration until more firm commitments can be 
made regarding the risks and schedules for a new logic family. It has been deter- 
mined as a bottom line that an FMP can be built, and operated with acceptable 
reliability, from the existing LSI family being employed on the STAR-IOOA. 

The block-level simulation system required by Ames personnel to permit an ongoing 
verification of the design and the overall performance objectives is under development. 

A last minute decision was made to base the first, highest-level design model on the 
readily available General Purpose Simulation System (GPSS) so that it could be easily 
installed at any site performing FMP analysis. 


The first version of the simulator has been completed and is being delivered under 
separate cover to NASA Ames with the submission of this Final Report. The 
simulator provides a substantial amount of statistical data about the behavior of the 
various FMP units (Swap, Map, Memory, Scalar, Vector) when executing code sequences. 
This data permits the designers to evaluate alternate strategies for organization and 
implementation of the major components of the FMP. NASA analysts can use the 
data to verify that the internal characteristics of the FMP as documented in the 
Functional and Instruction Specifications are truly represented by the current version 
of the GPSS model. 

Test runs of the model immediately prior to completing this report have disclosed 
that there is need for refinement in some areas, Aus the model is not complete 
to the extent needed by the design en^eers. It is adequate in cinrent form, 
however, for NASA analysts to evaluate the behavior of these initial FMP designs. 

c. Reliability assessment - A reliability analysis was conducted of the revised FMP 
design, using experimental statistics from existing logic family data, expectations of 
the LSI and memoty characteristics of componentry borrowed from the STAR-IOOA 
project, and projections made by our technological team for as yet untried technological 
developments. The key conclusion was that the total FMP including Backing Storage 
would suffer an expected rate of failure of 4 per operating month, if a reasonable 
maintenance schedule was followed to clean out single errors that are automatically 
corrected by the SEEDED networks. This failure rate is expected fi:om the existing 
' technology family of LSI. The failure rate is almost halved if a newer technology 
can be utilized for the FMP. 
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Since the major rates of failure (astern interruption) revolve around the Vector 
Units, the additional (ninth) unit was designed into the FMP to permit quick restart 
capability, thus improving fte overall machine availability. 

2. Software description 

a. Programming language — A software description was developed, providing a “straw-man” 
FORTRAN language to be tested by Ames and Control Data programmers, and a 
description of the fiindamental operating system properties required by the NASF, 
constructed in a format suitable for evolving into an operating system specification^ 

The FORTRAN dialect was created after much thought about automatic vectorization of 
standard FORTRAN constructs led Control Data to the concluaon that the compiler 
writers and compilers need a little assistance in the generation of optimum object code 
for parallel machines. Ames staff comments on the origioal extensions to FORTRAN 
based on the STAR FORTRAN compiler, led RADL researchers to abandon the multitude 
of extensions therein and concentrate on the definition of a CODO (Concmrent DO loop) 
structure, within which all operations are explicitly vector. 

The minimization of language enhancements reduces the risks attendant to retraining 
programming personnel, as well as major compiler developments, to implement new 
syntax. This simplification is accompliMied at the cost of more intensive effort on 
compiler optimization of source code. The optimization techniques are not unique 
to tire FMP in that they have their direct counterparts in the scalar optimization 
performed in most product compilers available in 1978. The risk of providing a 
mature compiler with this capability is reasonably low compared to the alternatives 
of creating a wholly new language and compiler system from scratdi. 

b. Operating system — The operating system description attempts to reduce to a bare 
minimum those functions which must uniquely be implemented in the FMP. Further, 
every attempt must be made to utilize existing system software on the non-FMP 
processors, which will be prociued essentially off-the-shelf, in order to minimize the 
software risks highlighted in the Phase 1 report. 

The operating system document is intended to evolve into formal specification form in 
later phases of this project, as are all portions of the NASF project. A ^stem should 
be created to handle the updating and editing of these documents as a central means 
for coordinating the total project To this end, outlines for the specification of language 
structure and compiler structure should have been completed in this study phase. 

The prototypes originally intended to be used for this function (the ANS 77 language. 
specification, and fte SIAR FORTRAN internal maintenance specification) proved 
insufficient (and in some cases unnecessarily complicated) to ffll this role. The outline 
of these specifications should be done at the earliest posrible moment in the next 
project phase. 


FINAL WORD 

The achievement of the original NASF project goals appears at this point to be more possible than when 
the feasibility was first examined at the beginning of Phase 1. Continued study of the code requirements 
and refinement of the design have led to simplifications in the FMP that reduce the software and hardware 
risks below those originally derived for the NASF. A major factor in reducing risks and meeting the 
performance and schedule goals for the FMP rest in the willingness of the algorithm developers to bend 
their thinking somewhat toward a more intimate knowledge of the hardware for which they are creating 
programs. This flexibility in adapting thought and code to a special machine architecture has been a major, 
positive characteristic of the Ames flow model mathematicians and programmers, with their experience on 
ILLIAC IV and the 7600 on which to draw. Thus the success of the FMP seems assiued if the intimate 
marriage of code developers, hardware designers, and technologists can continue for the life of the project. 
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PREFACE 


This Final Report presents additional findings resulting from an extension of the Preliminary Study for a 
Numerical Aerodynamic Simulation Facility (NASF) as determined by Control Data Corporation. The 
document consists of five sections. Other than Section 1, Introduction, each section addresses a specific 
aspect of the Control Data study. 

Section 1 is a background of the Phase 1 study and what has been achieved since the issuance of the 
study. Section 2, Hardware Description, presents a more detailed definition of the FMP than that provided 
in the Phase 1 report. Section 3, Reliability Assessment, gives information on the methodology and 
analysis used to estiniate failure of hardware components plus the estimated failure rate data. Section 4, 
Software Description, expands previous discussions of proposed FMP software. Section 5, Appendixes, 
presents both the Instruction and Fimctional Computer FMP Specifications plus appendixes bn a Programmed 
Device Controller and Serial Trunk Controller Procedures. 

In addition to this Final Report, a separate Summary Report presents the salient findings of this prelim-, 
inary study and summarizes the extenaon of the first phase of a program for the development of a 
Numerical Aerodynamic Simulation Facility. 
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INTRODUCTION 



Section 1 


INTRODUCTION 


Since early in the year 1977, the Ames Research Center of the National Aeronautics and Space 
Administration (NASA) and the Research and Advanced Design Laboratory (RADL) of Control Data 
Corporation have been conducting a cooperative research program in the investigation of the feasibility 
and applicability of extremely hi^ performance computers to the process of airframe design. The first 
phase of this effort culminated in a Final Report produced by Control Data for NASA in October 1977. 
like the first phase investigations themselves, that report attempted to answer several questions posed by 
Ames researchers such as: 

“How much compute power is necessary to achieve the design and engineering goals of 
NASA aerodynamacists?” Answer: “Some computing power in excess of one-billion 

floating-point computations per second.” 

“Can such a computer be built for operation in the early 1980s, with acceptable reliability 
and availability?” Answer: “Yes.” 

“What are the architectural alternatives for such a computer?” Answer: “Either a pipeline 

or array processor of the SIMD variety is best suited to the task.” 

“What technologies are available for implementing such a high powered machine?” 

Answer: “In practical supply or operating parameters, very few technologies. The computer 

should be built with the mature and still extendable silicon emitter-coupled logic for speed.” 

“IVhat system architecture is suitable for the entire computational facility?” Answer: “A 

distributed processing system similar to that implemented for the STAR-100 family.” 

“What are the programming considerations for such a machine?” Answer: “FORTRAN 

should be the computational programming language with extensions to permit specific access 
to hardware parallelism, when necessary.” 

The Phase 1 report consciously included much tutorial material to assist Ames personnel in perceiving the 
state of the computer design and construction art at the same level as Control Data deals with new 
computer developments. Included in the report were preliminary designs and descriptions of a possible 
candidate machine architecture, installation design, and programming approaches for the review and 
commentary of the j^es project staff. Discussions between Ames and Control Data researchers during 
the preparation of draft material for the Final Report led to substantial revisions in the technical design 
of the language and machine structure. Subsequent to the publication of that report, similar discussions 
have led to even more changes in the implementation of the algorithm, language, and computer hardware. 

Since this initial effort was intended to be a part of an ongoing development project leading to the actual 
design of all system components and to the construction of a facility to perform the aerodynamic simula- 
tions, an extension of the study was indicated, preliminary to launching more refined design and simulation 
of the system. 


1-1 



This study extension was intended to bridge the gap between the completion of the feasibility phase and 
the second, detailed spedfication "phase of the research effort. 

PHASE 1 EXTENSION DESCRIPTION 

An examination of the state of the project after the Final Report for Phase 1 was conducted and the 
apparent needs for the Phase 2 stage were outlined. Several desirable tasks became evident from that 
analysis, from NASA commentary regarding the tme feasibility of the proposed approach, and from NASA 
questions regarding the broader appEcability of the system being considered. In addition, a three-dimensional 
form of the Ames “Implicit code” (whose 2-D form was used in the Phase 1 Final Report to evaluate the 
design) became available. Thus a form of the solution methodology closer to what is expected to be used 
in the final facility was at hand for analysis with the proposed system architecture. 

These factors led to the establishment of a formal “Phase 1 Extension” whose purpose was to continue 
the Phase 1 work in several areas and to provide the springboard for the Phase 2 work on detailed 
design, analysis, and specification of system components. To this end several tasks were identified: 

1. Analysis of the three-dimensional model operating on the proposed computer 
architecture. 

2. Development of a proposed programming language form. 

3. A more detailed description of the computational engine structure so that timing 

and storage estimates could be made. 

4. Development of tools to provide verification of: 

a. the ability of a given- design to perform the calculations in the time required; 

b. the reahty of creating hardware that will perform as described in (a.) above. 

5. Description of the overall operating system function to support the computational workload. 

6. Refinement of reliability estimates for the computational hardware. 

With the resources available, and the brief three-month period for completion it was decided that all of 
the tasks could not be completed to the point of detailed specification, or final structural design of either 
hardware or software. Instead the Phase 1 Extension -was conceived as a means for providing the structure 
within which all subsequent phases would be carried out, and as a period when a number of proposed 
language and hardware schemes could be reconciled with Ames staff members. Past experience indicates that 
the most appropriate means for structuring and controlling a development project of this magnitude has been 
most effectively achieved through the identification and implementation of “standard” specifications for each 
system component. 

The basis for the report which follows is thus a skeleton structure of specifications for the computational 
hardware, the operating system, the language and its processor, and its reliability. In some cases sufficient 
detailed work was completed to permit the beginning of a complete specification. For example, since much 
of the STAR-1 OOA Scalar Unit and maintenance philosophy were to be employed in Control Data’s 
proposed architecture, the STAR-IOOA specification was used as a basis for the FMP hardware specification. 
This specification will be modified and updated as design continues throng the Phase 2 and final construc- 
tion portions of this project. 
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An outline for an operating system specification was generated and used as the basis for a preliminary 
^stem description of the overall operating system. In subsequent phases of the project, each of the 
paragraphs in the outline will be replaced by specification information rather than the functional require- 
ment or preliminary design data as provided in this report. 

Although it was intended to provide a full specification for the language and compiler in this phase, the 
effort was beyond the means of RADL to accomplish in the three months allowed. This is partly due to 
the changes in direction for the language that have resulted from interaction with Ames staff and with 
Control Data’s language specialists. An initial attempt to base a specification on the ANS FORTRAN 77 
specification was abandoned as approval of that standard has been delayed, and as it became apparent that 
the specification as written could not be easily used as a guide for experimental programming. The use 
of the STAR-100 FORTRAN compiler documentation proved to be undesirable as Ames personnel insisted 
on abandoning some of the artificial constructs for vector processing that were felt to be difficult to use, 
or to understand. The result then, in this first pass, is a presentation of a “strawman” proposal for a 
FORTRAN language based on ANS FORTRAN 77, with several extensions designed for the flow model 
computations to be done at Ames. 

The process of providing tools for architectural and hardware verification has proceeded down divergent 
paths also. At the time of submission of tliis report, a tape is being provided to Ames which will allow 
them to perform computerized evaluations of the behavior of the overall computational segment of the 
installation with varying forms of machine language coding. 

The report that follows then presents further proposals for the hardware architecture and is, in essence, a 
proposal for the form and content of specifications to be generated in ftill in the next phase of this project. 
It is expected that these proposals will lead to discussions between Control Data and Ames within the 
coming months to arrive at a final architectural project structure and project management and control format 
based on documentation methodology and management. 

Unlike the Phase 1 report, this report contains no major answers to major questions. However, the feasibility 
of construction of the desired facility appears greater now than it did in Phase 1. Further design attempts 
to reduce the size and complexity of the major processor have improved chances of its being built with 
existing technology, thus reducing one of the significant risks highlighted in the Phase 1 study. The direction 
for the continuation of design and implementation of the total facility is becoming clearer and more practical 
as the cooperative study continues. 

RELATIONSHIP TO OVERALL PROJECT 

In the initial study phase, the computational engine was referred to as the NSS (Navier-Stokes Solver). The 
existence of such a massively powerful system, with the attendant major investments in facilities, money, and 
personnel mandate an examination of the broader applicability of such a system. Thus, to symbolize the 
more catholic potential of this facility, the computational engine has been renamed the Flow Model Processor 
(FMP). The other computers in the system (regardless of computational capability) are called Front-End 
Processors (FEPs) to eliminate the need for identifying a particular brand or architecture, and the intelligent 
(programmable) communications equipment and attachment devices are called Programmable Device Controllers 
(PDCs). Regardless of the shape and content of the final system, these major components will exist in some 
form and thus the use of this terminology pervades this report. 
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As stated in the previous section, the outlines of documentation in specification form are to establish, the 
structure for documentation for following stages of the project. If a fixed form of specification can be 
agreed upon for each component (language, compiler, FMP monitor, and FMP hardware) then a rigorous 
control system can he established in Phase 2 wherein all design changes are referenced to their particular 
area of specification and an automated audit trail provided for each specification update, thus tracing the 
evolution of the system, and fixing responsiblity (fay designer) for any given change in the specifications. 

It is at this stage of the project wherein such management devices must be created lest the project be 
faced with virtual chaos in Phase 2 and later phases of the effort. 

Thus the interaction with NASA project managers is essential concerning the form (not the content) in 
which all technical matters are submitted for review and decision from this point forward. As stated in 
the previous section, the outlines or specification skeletons offered here are not expected to be all inclusive 
but to form the basis for detailed discussions at a later time. 
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Section 2 

HARDWARE DESCRIPTION 



Section 2 


HARDWARE DESCRIPTION 


PERFORMANCE METRICS 

For this study the three-dimensional forms of the implicit and explicit codes were submitted to Control 
Data for additional evaluation of the FMP design that has evolved since Phase 1. The Phase 1 effort of 
code analysis was primarily directed to the implicit form because its computational behavior was easier to 
estimate since the mnnber of times arithmetic is performed is fixed rather than somewhat data dependent, 
as happens in the explicit form of the Navier-Stokes solution. The emphasis in this study was extended 
to the three-dimensional code primarily because of the data already derived for the two-dimensional model 
described in the Phase 1 study. 


ANALYSIS OF VECTORIZED 3-D MODELS 


The analysis of the current FORTRAN codes undertaken in this phase was intended to accomplish the 
following items: 

A. Updating of statistical data on computational behavior of the code for the three- 
dimensional versions to contrast with the data taken from the STAR-100 in 
Phase 1 of this study. 

B. Identification of the key areas of the codes to isolate benchmaric candidates for 
measuring performance. 

C. Experimentation with various forms of somrce language coding to produce vectorization. 

D. Develop segments of code that could be ‘hand-compiled’ for the FMP to illustrate 
the machine-language-level execution of portions of the computation. 

E. Analysis of memory access patterns induced by the code in three dimensions. 

An inordinate amount of project resources were quickly absorbed in the redesign of some portions of the 
FMP and in pursuing several alternatives for the programming language. Thus the objectives of this section 
of the study became redirected to match the time available, and to answer some more pressing questions. 
The results of each original objective are as follows: 

A. No statistical counting was done for either the three-dimensional implicit or explicit 

codes on the STAR-100. Computational counts for the implicit code which were projected 
for three dimensions in the Phase 1 report, appear to be roughly matched to the 3-D 
implicit version now in hand. 

B. It is obvious to the casual analyst that the region of code in the 3-D implicit program 
containing the AMTRX, FILTRX, FILTRY, FILTRZ and three subroutine calls to the 
metric computations XXM, YYM, ZZM constitute the critical area for analysis of the model. 
From Table 5-40 of the Phase I report, the ‘left-hand-side’ calculations including 

those code sequences account for 19,779,456 operations out of a total of 24,666,130 
operations, or about 80 percent of the total operations. In addition, the ‘sweeps’ made 
in this code in the three directions of the matrix create all of the expected access 
patterns that need to be analyzed for the FMP. No effort was expended on a similar 
analysis of the explicit code due to lack of resources and time. The original scalar 
coding in FORTRAN of these segments is given again in Figure 2-1. 
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DO 12 J = UMAX 
R1 = XX(J,1)*HDX 
R2 = XX(J,2)*HDX 
E3 = XX(J,3)*HDX 
R4 = XX(J,4)*HDX 

C 

C*‘^*****AMATRX 

C 

RR = l./Q(KL,l,J) 

U = Q(KL,2,J)*RR 

V = Q(KL,3^)*RR 

W = Q(KL,4^)*RR 

UU = U*R1+V*R2+W*R3 

UT = U**2+V**2+W^*2 

Cl = GAMI*UT*.5 

C2 = Q(KL,5J)*RR*GAMMA 

C3 = C2 - Cl 

C4 = R4+UU 

C5 = GAMI*U 

C6 = GAMI*V 

C7 = GAMPW 

D(J,1,1) = R4 

D(J,1,2) = R1 

D(J,1,3) = R2 

D(J,1,4) = R3 

D(J,1,5) = 0. 

D(J,2,1) = R1*C1 - U*UU 
D(J,2,2) = C4+R1*GAM2*U 
D(J,2,3) = -R1*C6+R2=^U 
D(J,2,4) = -R1*C7+R3*U 
D(J,2,5) = RPGAMI 
D(J,3,1) = R2*C1-V*UU 
D(J,3;2) = R1^V-R2*C5 
D{J,3,3) = C4+R2*GAM2*V 
D(J,3,4) = -R2*C7+R3*V 
D(J,3^) = R2*GAMI 
D(J,4,i) = R3*C1-W*UU 
D(J,4,2) = R1*W-R3*C5 
D(J,4,3) = R2*W-R3*C6 
D(J,4,4) = C4+R3*GAM2*W 
D(J,4,5) = R3*GAMI 
D(J,5,1) = (-C2+2.*C1)*UU 
D{J,5,2) = R1*C3-C5*UU 
D(J,5,3) = R2*C3-C6*UU 
D(J,5,4), = R3*C3-C7*UU 
D(J,5,5) = R4+GAMMA*UU 

C 

C******END OF AMATRX 

C 

12 CONTINUE 


Figure 2-1. Scalar Code Taken From STEP of 3-D Code 



DO 25 J=JA,JB 
RJ = l./Q(KL,6;f) 

RMJ = RM*RJ 
RR = RMJ*Q(KL,6J-1) 
RF = RMJ*Q(KL,6,J+1) 
DO 23 N==l,5 

A(J;^,1) = 

A(J^,2) = -D(J-1^,2) 
A(J;^,3) = -D(J-1J4,3) 
A(J^,4) = -D(J-1^,4) 
A(J^,5) = -D(J-1AS) 

= 0.0 
B(J,N,2) = 0.0 
B(J^,3) = 0.0 
B(J^,4) = 0.0 
B(JJ4,5) = 0.0 
C(J^,1) = D(J+i;'J,l) 
C(J^,2) = D(J+1^,2) 
C(J^,3) = D(J+i;S[,3) 
C(J^,4) = D(J+1^,4) 
C(j;^,5) = D(J+14^,5) 
A(J,NJ^) = A(JJ^,N>RR 
= C8 

= C(J^;^>RF 
23 F(J^)=S(KL^jr) 

25 CONTINUE 

C 

C*****END OF HLTRX 


C S MUST BE ZERO ON B.C. 
CALL BTRI(2^M) 

DO 21 J = 2JM 

S(KL,1,J) = F(J,1) 

S(KL,2J) = F(J,2) 

S(KL,3J) = F(J,3) 

S(KL,4^) = F(J,4) 

21 S(KL,5^) = F(J,5) 

20 CONTINUE 


Figure -2-1. Scalar Code Taken From STEP of 3-D Code (Cont.) 
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COMMONA'ARO/S(720.5,30) 
COMMONA^AR1/X(720,30),Y(720,30),Z(720,30) 
COMMON/VAR3/P(120,30),XX(60,4),YY(60,4),ZZ(60,4) 
LEVEL 2,Q.S,X,Y^ 

COMMON/C'OUNT/NC^Cl 

COMMON/FLSH/DX2,DY2JDZ2 

C 

C XI METRICS FORMED FOR A K,L LINE IN J 

C 

C 

C SYMMETRY 
C 

K = M 
LA 
J1=J1A 
32=J2A 

KL = (L-1)*ND+K 
DO 10 J = JU2 
RJ = Q(KL,6J) 

IF(K.EQ.l) GO TO 50 
IF(K.EQ.KMAX) GO TO 51 
XK = p^(KL+l^>X(KL-U))*DY2 
YK = (Y(KL+l,J>Y(KIrlJ))*DY2 
ZK = (Z{KL+M>Z(KH^))*DY2 
GO TO 72 

50 CONTINUE 

XK = (-3.*X(KL^)+4 *X(KL+U>X{KL+2J))*DY2 
YK = (-3 *Y(KL,J)+4*Y(KL+U>Y(KL+2J))*DY2 
ZK = (-3*Z(KLJ)+4*Z{KL+U>Z(KL+2^))*DY2 
GO TO 72 

51 CONTINUE 

XK = (3*X(KL,J>4*X(KU1^)+X{KL-2J))*'DY2 
YK = (3 *Y(KL^>4-*Y(KL-M)+Y(KU2Jf))*DY2 
ZK = (3 *Z{KLJ>4 *Z(KL-1^)+Z{KL-2J))*DY2 
72 CONTINUE 

IF(L.EQ.l) GO TO 52 
IF(L.EQ.LMAX) GO TO S3 
XL = (X(KL+ND,J>X(KL-ND,J))*DZ2 
YL = (Y(KL+ND,J>Y{KUND^))=*=DZ2 
ZL = (Z(KL+ND,J>Z(KL-ND,J))*DZ2 
GO TO 60 

52 CONTINUE 

XL = (-3 *X(KLJ)+4.*X(KL+ND^)-X(KL+2*ND^))*DZ2 
YL = (-3 *Y(KLJ)+4 *Y(KL+ND,J>Y(KL+2*NDJ))*DZ2 
ZL = (-3.*Z(KL^)+4.*Z(KL+ND^>Z(&+2*NDJ))*DZ2 
GO TO 60 

53 CONTINUE 

XL = (3 *X{KL^)-4 *X(KL-ND,J)+X(KU2*ND,J))*DZ2 
YL = (3.*Y(KL^H.*Y(KL-ND,J)+Y(KL-2*ND;F))*DZ2 
ZL = (3 *Z(KL^>4-*Z(KL-ND.J)+Z(KL-2*NDJ))*DZ2 
60 CONTINUE 

XX(J,1) = (YK*ZUZK*YL)*RJ 
XX(J^) = {ZK*XL-XK*ZL)*RJ 
XX(J,3) = (XK*YL-YK*XL)*RJ 
XX(J,4) = -OMEGA*(Z(KL^)*XX(j;j>Y(KL^)*XX{J,3)) 
10 CONTINUE 
RETURN 
END 


Figure 2-1. Scalar Code Taken From STEP of 3-D Code (Cont.) 
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HSS^EODUCIBILIT'/ OP THE 
OEIGINAL PAGE IS POOR 


SUBROUTINE BTRI(ILA4UA) 

CO!V1MON/BTR1D/A(6Q,5,5),B(60,5,5),C(60,S,5)4)(60,5,5),F(60,5) 
DIMENSION H(5,5) 

REAL L11,L21,L22,L31,L32,U3,L41 ,L42,L43,L44,L51,L52,L53,L54,L55 

IL=ILA 

IU=IUA 

IS=IL+1 

IE=IU-1 

C INSERT LUDEC 
L11=1./B(IL,1,1) 

L21=B(IL,2,1) 

U12=BaL,l,2)*Lll 

L22=l./(BaL,2^>L21*U12) 

U13=BaL,l,3)*Lll 

U14=B(IL,1,4)*L11 

U15=B(IL,1,S)^L11 

L31=B(IL,3,i) 

L32=B(IL,3,2>L3I=!=U12 
■ U23=(B(IL,23>L21*U13)*L22 
L33=1./(B(IL,33>U13*L31-U23*L32) 

U24-(B(IL,2,4>L21*U14)*L22 

U25=(B(IL,23>L21*U15)^L22 

L41==B(IL,4,1) 

L42=B(IL,4,2)-L41=^U12 

L43=B(IL,4,3)-L41*U13-L42*U23 

U34=(BaL,3,4>L31*U14-L32*U24)*L33 

L44=L/(B(IL,4,4>U14*L41-U24*L42-U34*L43) 

U35=(BaL,3,S>L31 *U15-L32*U2S)*L33 
LS1=B(IL,54) 

L52=BaL,53)-L51*U12 

L53=B(IL,53HS1*U13-L52*U23 

L54-B(IL,5,4)-LS1*U14-L52*U24-L53*U34 

U45=(B(IL,4,5>L41*U15-L42*U25-L43*U35)*L44 

U5=1./(B(1L,55).L51=«‘U1S-L52*U25-L33*U35-L54*U45) 

C COMPUTE UTTLE R S 

Dl=Lll*FaL,l) 

D2=L22*(F(IL,2>L2i*Dl) 

D3=L33*(F(IL,3>L3 1 *D1-L32*D2) 

D4=L44*(F(JL,4}-L41*D1-L42*D2-L43*D3) 

D5=L55*(F(IL,3>LS1*D1-L52*D2-L53*D3-L54*D4) 

C COMPUTE BIG R S 

F(IL,S)=D5 
F(IL,4>=D4-U45*D5 
F(IL3)=D3-U34*F(IL,4>U35*DS 
F(IL,2)=D2-U23*F(IL,3}-J24*F{IL,4>U25*D5 
F(IL4)=D1-U12*F(IL.2>U13*F(IL,3)-U14*F{IL,4>U15*D5 


Figure 2-1. Scalar Code Taken From STEP of 3-D Code (Cont.) 
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c 


COMPUTE C PRIME FOR FIRST ROW 
DO. ,12 M=l,5 
D1=LH*C(IL,1,M) 

D2=L22*(C{IL,2;VI>L21*D1) 

D3=L33*(C(IL,3^>L31*D1-L32*D2) 
D4=^L44*(C(IL,4Jtf)-L41*Dl*L42*D2-L43*D3) 
D5=L55*(CaL,5;W[>LSl*Dl.L52»D2-LS3*D3-LS4*D4) 

B(IL,5>l)=D5 
BaL,4M)=D4-U45*D5 
B(IL,3^)=D3-U34*B(IL,4;Wi>U3S*DS 
BaL,2^)=D2-U23=*=B(IL,3Jkl)=U24*B(IL,4Jtf)-U25*D5 
12 (BIL,l^)=Dl-U12*B(IL,2,M)-U13‘^B(IL,3;tf)-U14*B(IL,4^>U15*D5 

DO 13 I=ISJE 

C COMPUTE B PRIME*BIGR 

DO 14 N=l,5 

14 Fa,N>=FaN^Aa,N,l)’*'F{M ,l>Aa^,2)*F(I-l ;2>A(IJ^,3)*F(M ,3) 

1 >A(I^,4)*F(I-l,4)-Aa;^^rF(I-l,5) 

C COMPUTE B PRIME 

DO II N=l,5 
DO 11 M=l,5 

1 1 HG^^)=B(I J^;tf>A(I^,l )*Ba-l ,1 ;H>A(I^,2)*B(I.l ^^>A(W3)=^ 

1 •Ba-l,3;W)-AaW)*Ba-l,4;W>A(I^,5)*B(I-l,5;H) 

C INSERT LUDEC AGAIN 

L11=1./H(1,1) 

L21=H(2,1) 

U12=H(l;2)*Lll 

L22=1./(H(2^>L21*U12) 

UI3=H(1,3)*L11 

U14=H(1,4)^L11 

U15=H(1,S)*L11 

L31=H(3,1) 

L32=H(3;2)-L31*U12 
U23=(H(2^)-L21 *U1 3)*L22 
L33=1./(H(;3,3)-U13*U1-U23*L32) 

U24=(H(2,4>L21 *U14)*L22 
U2S=(H{2^>L21*U15)*L22 
L41=H(4,1) 

L42=H(4,2>L41*U12 
L43=H(4,3>141*U13-142*U23 
U34=(H(3 ,4>L3 1 *U14-L32*U24)*L33 
L44=1./(H(4,4>U14«=L41-U24*L42-U34*L43) 

U35=(H(3 ,5>L3 1 *UI 5-L32*U25)*L33 
L51=H(5,1) 

LS2=H(5,2>L51*U12 
L53=H(5,3>L51*U13-L52*U23 
L54=H(5,4>L5 1 *U1 4-L52*U24-LS3*U34 
U45=(H(4,S>L41 *U1 5-L42*U25-L43*U3S)*L44 
L55=1,/(H(5,5>L51*U15-L52*U25-LS3*U35-L54*U45) 

C COMPUTE LITTLE R^-S 
D1=L11*F(I,1) 

D2=L22*(F(ia>L21*Dl) 

D3-L33*{F(I,3>L3 1 *D1-L32*D2) 

D4-L14’S'(F(I,4>L41 *D1-L42*D2-L43*D3) 
D5=L55*(F(I,5>L51*D1-L52*D2-L53*D3-LS4*D4) 


Figure 2-1. Scalar Code Taken From STEP of 3-D Code (Conf.) 
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C COMPUTE BIG R>^ S 

F{I,5)=DS 
F(I,4)=D4-U45*D5 
F(I,3)=D3-U34-^F(I,4>U35*DS 
F(I,2)=D2-U23*F(I,3>U24*F(I,4)-U25*D5 
F(I,l)=Dl-U12*F(I,2>U13*F{I,3>U14*Fa,4)-U15*D5 

C COMPUTE C PRIMES 

DO IS M=I,5 

D2=L22*(C(I,2^)-L21*D1 ) 

D3=L33*(C(I^;«)-L31*D1.L32*D2) 

D4-L44*(C(I,4JVI>L41*D1-L42*D2-L43*D3) 

D5-L5S*(C(I,SJVI>L51*D1-L52*D2-L53*D3-L54*D4) 

Ba,5J«)=D5 

B(I,4^)=D4-U45*D5 

B(I,3IVI)=D3-U34*B(I,4^>U35*D5 

B(I,2^)=D2-U23*B(I,3JV1)-U24*B(I,4^I)-U25*D5 

15 B(I,lJVl)=Dl-U12*Ba2;VI>U13*Ba,3^I>U14*Ba,4;a)-U15*D5 

13 CONTINUE 

I=IU 

C COMPUTE B PRIME*BIG R FOR LAST ROW 

DO 17 N=1,S 

17 F(I^)-Fa,N>Aa^,l)*F(I-l,l>-A(IJ^,2)*F(M,2>Aa4^,3)* 

• *F(U ,3)-A(I J^,4)*F(I-1 ,4>-A(i;4,5)*F(I-i ,5) 

C COMPUTE B PRIME 

DO 18 N=1,S 
DO 18 M-1,5 

18 H(N,M)=B(WM0-A(I^,1)*B(M,1,M>A(I^,2)*B(I-1,2A1>A(I,N,3)* 
*B(1-1 ,2^>A(i;^,4)*B(I-l ,4 JW>Aa J<,5)*B(I4 ,S;tf) 

C INSERT LUDEC AGAIN 

Lll=l./H(l,l) 

L21=H(2,1) 

U12=H(1,2)*L11 

L22=1./(H(2^>L21*U12) 

U13=H(1,3)*L11 

U14=H(1,4)*L11 

U15=H(1,5)*L11 

L31=H(3,1) 

L32=H(3,2>U1*U12 

U23=(H(2,3)-L21’^U13)*L22 

L33-1./(H(3,3>U13*I31-U23*U2) 

U24=(H(2,4)-L21 *U14)*L22 
U25=(H(2 ,S)-L21 *U1 S)*L22 
L41=H(4,1) 

L42=H(4;2>L41*U12 
L43=H(4 ,3>L41 *U1 3-L42*U23 
U34=(H(3,4>L3 1 *U1 4-L32*U24)*L33 
L44-1./(H(4,4>U14*L41-U24*L42-U34*L43) 

U35=(H(3,S)-L3 1*U1 5-L32*U25)*L33 
L51=H(5,1) 

L52=H(5,2):LS1*U12 

L53=H{5,3>L51*U13-L52*U23 

L54=H{S,4>L51*U14-L52*U24-L53*U34 

U45=(H(4,5>L41*U15-L42*U25-L43*U35)*L44 

L55=1./(H(S,5)-LS1*U15-L52*U25-L53*U35-L54*U45) 


Figure 2-1. Scalar Code Taken From STEP of 3-D Code (Cont.) 
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C COMPUTE LITTLE S 

Dl=LU*Fa,l) 

D2=L22*{F(I,2>L21*DI) 

D3=U3*(F(I,3)-L31 *D1-L32*D2) 

D4=L44*(F(I,4)-L41 *D1-L42*D2-L43=*=D3) 
D5=L55*(F(I,5>L51*D1-L52*D2-LS3*D3-LS4*D4) 

C COMPUTE BIG RvS 

F(I,S)=D5 
F(I,4)=D4-U45*D5 
F{I,3)=D3-U34*F(I,4>U35*D5 
F(I,2)=D2-U23*Fa,3>U24*F(I,4>U25*D5 
F(I,1)=D1-U12*F(I,2>U1 3*F(I^>U14*F(I,4)-U1 5*D5 
I=IU 

20 1 = 1-1 

DO 19 N=l,5 

19 F(IJ4)=F(I^)-F(I+l,l)*Ba^,l>F(I+l^)*Ba^,2>F(I+U)*B(i;^,3) 

1 )-Fa+l,4)*B(I^,4)-F(I+l,S)*B(IJN,S) 

IF (I.GT.IL)GOTO20 

RETURN 

END 


Figure 2-1. Scalar Code Taken From STEP of 3-D Code (Cont.) 


C. The segments of code referred, to in B. above were subjected to coding in a variety 

of FORTRAN dialects. The result was the proposal for a set 'of FORTRAN extensions 
which are found in Section 4 of this report. Figure 2-2 presents a moderate recoding 
of the first of three sweeps of the left-hand-side code being analyzed as an example of 
how such code might appear to the applications programmer. The implicit code, 
admittedly, is structurally simple, and creates no data dependent vector operations of 
any magmtude. The vector coding of the implicit code can therefore be done in a 
very straightforward manner, using the CODO constructs {see Section 4) almost exclusively 
to provide optimum processing on the FMP. 

D. An effort was initiated late in this phase to hand 'compile the entire key segment of the 
implicit code. The delay in initiating this effort arose from several fruitless attempts at 
over enrichment of the FORTRAN ^ntax to support the FMP. The result has been that 
only a small segment could be prepared for publication in the time remaining. Figure 2-3 
gives the hand compilation example for lines 100 through 170 of the code ^own in 
Figure 2-2. 
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■^^rauciBiLmr OF M 
QmmM, PAGE 


100 = 

110 = 

120 = 

130= 

140= 

150= 

160= 

170= 

1S0= 

190= 

200= 

210 = 

220 = 

230= 

240= 

250= 

260= 

270= 

280= 

290= 

300= 

310= 

320= 

330= 

340= 

350= 

360= 

370= 

380= 

3'»0= 

400= 

410= 

420= 

430= 

440= 

450= 

460= 

470= 

480= 

490= 

500= 

510= 

520= 

530= 

540= 

550= 

560= 

570= 

580= 

590= 

600= 

610= 

620= 

630= 

640= 

650= 

660= 


DO 20 L=2,LMAX-1 
C***FILTRX 
C 

CODO J=1 , JMAX ■ K=2, KMAX 
RJ=Q<K,L,6, J) 

XK=(X(K+l,Li J)“X(K-1,L, J) )»DY2 
YK=(Y(K+1,L, J)-Y{K-1,L, J))»DY2 
■ZK=( Z < K+1 , L, J )-Z ( K-1 , L, J ) ) *DY2 
XL=(X(K,L+l7 J)-X(K,L-1, J) )*DZ2 
YL=(Y<I< ,L+1, J)-Y<p-,L-l7 J> )*DZ2 
ZL= ( Z ( K , L+1 , J ) -Z ( K 7 L- 1 , J n -»DZ2 
D( J, 1,2)=HDX*< <YK*ZL-ZK*YL)*RJ) 

D ( J 7 1 , 1 ) =HDX* < -OMEGA* ( Z < K 7 L 7 J ) * ( R J* ( YK*ZL-ZK*YL > ) 
1 -Y(K7L7 J)*RJ*(XK*YL-YK*XL) ) ) 

D(J, 1,4)=HDX*< (X1«.»YL-YK*XL)*RJ) 

D(J, 1,3)=HDX*( (ZK*XL-XK*ZL)*RJ) 

C 

C»*#*l *AMATRX 

C 


RR= 1./Q(K,L7 I 7 J) 

U = Q(K7L727 J)*RR • 

V = Q(K,L737 J)*RR 
W = Q(K7L74, J)»RR • 

UU = LI*D< J, l72)+V*D(J7 l73)+W*D< J, 1,4) 
UT = U*-»-2+V**2+W**2 


C 


Cl = GAMI»UT*.5 

C2 = 0<K7L,5, J)*RR*GAMMA 

C3=C2-C1 

C4=D(J, 1, 1 )+LfU 

C5=GAMI*U 

C6=GAMI*V • 


C7=GAMI*W 
D(J, 1,5) = 

D(J,27l) = 

DtJ,2,2) = 
D(J,2,3) = 
DCJ,2,4) = 
D(0,2,5) = 
D<J,3,1) = 
D(J,3,2> = 
D(J,3,3) = 
D(J,3,4) = 
D(J,3,5) = 
D(J,4, 1) = 
D(J,472) = 
D(J,4,3) = 
D(J,4,4) = 
D(J,4,5> = 
D<J,5, 1) = 
D<J,5,2) = 
D<J,5,3) = 
D<J,5,4) = 
D(J,5,5) = 


O. 

D( J, i,2)*Cl-U*UU 

C4+D ( J, 1 7 2 ) *GAM2*U 

-D( J, 1,2)*C6+D( J, 1,3)*U 

-D( J, 1,2)*C7+D(J, 1,4)*U 

D(J, 1,2)*GAMI 

D(J, 1,3)*C1-V*IJU 

D< J, 1,2)*Y-DCJ, 1,3)*C5 

C4+D < J 7 1 7 3 )'*GAM2*V 

-D(J, 1,3)*C7+D< J, 1.4)*V 

D(J, 1,3)*GAMI 

D< J, 1,4)*C1-W*UU 

D( J, 1,2)*W-D<0, 1,4)*C5 

D(J, 1,3)*W“D(J, 1,4)*C6 

C4+D <J,l74)*CAM2*W 

D( J, 1,4)*GAMI 

(-C2+2.*C1)*UU 

D( J, 1,2)*C3-C5#UU 

D < J, 1 7 3 ) *C3-C6»UU 

D(J, 1,4)*C3-C7»UU 

D< J, 1, 1 )+GAMMA*ULl 


C*#**l * END OF AMATRX 


C 

ENDCD 


Figure 2-2, Proposed Recoding of Scalar 3-D Code Taken From STEP 
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670= 

C 



6S0= 


CDDD J=2, JMAX-1 ; N=1 , 5; p;=2> i'MAX 

690= 


A<J,N.l) = 

-D(J-l7N, 1 ) 

700= 


A(J,N,2) = 

-D( J-1,N,2) 

710= 


A<3:.N,3) = 

-D( ■J-l7N,3) 

720= 


A(J>N,4> = 

-D( •J-1,N,4) 

730= 


A(J.N,5) = 

-D( J-1 7 N 75 ) 

740= 


1) = 

0-0 

750= 


B(J,N,2) = 

0. 0 

760= 


B(J,N,3) = 

0.0 

770= 


B(J,N,4) = 

0.0 

780= 


BtJ,N,5) = 

0 

0 

790= 


C.(J7N, 1) = 

D( J+1,N, 1 ) 

800= 


C{J7N,2) = 

D(J+-1,N,2> 

810= 


C(J,N,3) = 

D( J+l7N,3) 

820= 


C(J>N,4) = 

D( J+I 7 N 74 ) 

830= 


C(.J,N7 5) = 

D( J+1,N,5) 

840= 


ENDCD 


850= 

C 



860= 


CODO J=2, JMAX-1 ; N=1 , 5! K=2, KMAX 

870= 


RM.J=RM/RJ 


880= 


A ( J. N, N ) =A ( J, N, N) -RMJ* < i2 ( K, L, 6 

890= 


B(J7N,N) = 

CS 

900= 


C(-J>N,N)=C( J,N,N)-RMJ+0(K,L,67. 

910= 


F( J,N)=S(K, 

L 7 N 7 J) 

920= 


ENDCD 


930= 

c 




-1 

. J-1) 
. 1 + 1 ) 


Figure 2-2. -Proposed Recoding of Scalar 3-D Code Taken From STEP (Cont.) 



fiE^RQDUClBILITY OP THP 

page is poor 


Q40= 

•?50= 

■=>iO= 

Q70= 

•?so= 

990= 
1000= 
1010= 
1020= 
1030= 
1040= 
1050= 
1060= 
1 070= 
1030= 
1 090= 
1100= 
1110= 
1120= 
1150= 
1140= 
1150= 
1160= 
1170= 
1JS0= 
1190= 
1200= 
1210= 
1220= 
1230= 
1240= 
1 250= 
1260= 
1270= 
1230= 
1 290= 
1 300= 
1310= 
1520= 
1330= 
1 340= 
1350= 
1360= 
1 370= 
1330= 
1390= 


C 

end of filtrx 


C S MUST BE ZERO ON B.C. 

C INSERT LLIDEC 

CODO S=2,KMAX-1 
L11=1./B<2,1,1) 

L21=B(2,2, 1 ) 

U12=B(2, 1,2)#L11 
L22=1./(B(2,2,2)-L21*U12) 

U13=B<2, 17 5)^^■L11 
IJ14 = B(27 1,4)*L11 
U15=B(2, 175)*-L11 
L31=B<2,-37 1) 

L52=Bt2,3,2)-L31wU12 

U23=<B<2,2, 3)-L21*U13)»L22 

L33= 1 . / ( B < 2 7 3 7 3 ) -IJ 1 3ifL 3 1 -U237fL32 ) 

U24= ( B ( 2, 2, 4) -L21*IJ1 4 ) *L22 
U25=<B(2,2,5)-L21»!J15)*L22 
L41=B(2747 1) 

L42=B(2,4,2)-L41*U12 

L43=B ( 2 7 4 7 3 ) -L4 1 *U 1 3-L42»U23 

U34=(B(2,3,4)-L31*IJ14-L32*LI24)*L33 

L44= 1 . / ( B ( 2, 4 7 4 ) -LI 1 4«-L4 1 -U2411L42-U 34*L43 ) 

LI35=(B<2,375)-L31*U15-L32*LI25)*L33 

L51=B(2.5, 1 ) 

L52=B ( 2 7 5 7 2 ) ~L5 1 -!HJ 1 2 
L53=B (2,573) -L5 1 #U 1 3-L52*U2 3 
L54=B (2,5,4) -L5 1 *U 1 4-L52»U24-L53*U34 
1.145= ( B ( 2 7 4 7 5 ) -L4 1 *U 1 S-L42*U25-L4 3*-U35 ) »L44 
L55= l./(B(2,5,5)-LSl*U 1 5-L52*U25-L5 3»U 35-L54-»U45 ) 
C COMPUTE LITTLE R S 

D1=L11#F(2, 1 ) 

D2=L22*(F(2,2)-L21.-«-Dl ) 

D3=L33»(F(2, 3 ) '-L31*B1-L32*D2 ) 

D4=L44* ( F ( 2 7 4 ) -L4 1 «-D 1 -L42*D2-L43*D3 ) 

D5=L55* ( F ( 2, 5 ) -L51*D1 -L52*D2-L53*D3-L54*D4 ) 

C COMPUTE BIO R S 

F(2,5)=D5 
F(2,4)=D4-U45*D5 
F ( 2 7 3 ) =D3-U34*F (2,4) -U35-«-D5 
F ( 2 7 2 ) =D2-U25«-F (2,3) -U24-B-F (2,4) -U25*D5 
F(2, 1 )=D1-U12*F(2,2)-U13*F(2, 3)-U14*F(2, 4)-U15*D5 
ENDCD 


Figure 2-2, Proposed Recoding of Scalar 3-D Code Taken From STEP (Cont.) 
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rSPRODBCIBILIH OS' THE 

S^PAGBISfOOE 


X400= C 
1410= C 
1 420= 

1 430= 
1440= 

1 450= 
1460= 
1470= 

1 4S0= 
14-70= 

1 500= 
1510= 

1 520= 
1530= 
1540= 
1550= C 
1560= 

1 570= C 
1530= C 
1590= 
1600= 
1610= 
1620= 
1650= C 
1640= C 
1650= C 
1660= 
1670= 
1630= 
1690= 
1700= 
1710= C 


COMPUTE C PRIME FOR FIRST ROW 

CODO M=i,5;r=2,f MAX-1 
D1=L11*C (2, 1,M) 

D2=L22*(C (2,2,M>-L21*D1 ) 

D3=L33*(C<2,3,M)-L31*D1-L22*-D2l 
D4=L44* ( C 1 2 , 4 , M ) “L4 1 *0 1 -L42«-D2-L4 3»D 3 ) 

D5=L55» O; ( 2 , 5 , M ) -LSI *D 1 -L52*D2-LS3*D3-L54*D4 ) 

B(2,5,M)=D5 
B < 2, 4, M ) =D4-U45*D5 • 

B(2, 3,M) = D3-U54-R-B(2T4,M)-U35-e-D5 

B<2.2,M) = D2-U23-a-B<2,3,M)-U24'»-B(27 4,M)-U25-B-D5. 

B(2,1,M) = Dl-iJ ■)fB(2,2,M)-U13'«-B(2,37M>-U14-i!-B(2.4,M) 

1 -IJ15*D5 

ENDCD 

DO 13 I=3,JMAX-2 
COMPUTE B PRIME*BIGR 

CODO N=l,5;h=2,FMAX-l ' 

F(I,|\1)=F(I,N)-A(I,N, 1)*F<I-1, 1 ).-AtI,N,2)*F<I-l,2) 

1 -A( I,N,3)*F(I-1)-A< I,N,4)*F( I-l,4)-A( I,N75)*F(I-1,5) 
ENDCD 

COMPUTE B PRIME 
CODO N=i,5;M=l,5;f,=2,l MAX-1 

H(N,M)=B( I,N,M)-A( 1,N, 1 )*B( I-l, 1 ,M)-A(l7N,2)*B< 1-1,2,M) 

1 -A( I,N,3)*B(I-l,3,M)-A{I,N,4)#B<I-l,4,M)-A(l7N’5)*B< I-l, 

2 5,M) 

ENDCD 


Figure 2-2. Proposed Recoding of Scalar 3-D Code Taken From STEP (Cont.) 
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1720= C 

1730= C 

1740= 

1750= 

1760= 

1770= 

1780= 

1790= 

1300= ' 

1810= 

1320= 

1330= 

1840= 

1350= 

1360= 

1370= 

1380= 

1 890= 
1«>00= 
1910= 
1920= 

1 930= 
1 ' 540 = 
1950= 
1960= 
1970= 
1<?S0= 
1*960= 
2000= C 
2010= 
2020= 
2030= 
2040= 
2050= 
2060= C 
2070= 
2080= 
2090= 
2100= 
2110= 
2120= 
2130= C 
2140= C 
2150= C 
2160= 
2170= 
2130= 
2160 = 
2200 = 
2210= 
2220= 
2230= 
2240= 
2250= 
2260= 
2270= 
2230= 


INSERT LUDEC AGAIN 


RBpRODUCiBttrry op the 

ORIGINAL PAGE IS POOR 


CODO K=2,l MAX-1 
Lil=l./H(l, 1) 

L21=H<2, 1) 

U12=H( l,2)»Lli 

L22=l. /(H(2,2)-L21*U12) 

■U13=Ha,3)#Lll 
U14=H<1,4)*L11 
U15=H(1,5)*L11 
L31=H(3, 1 ) 

L32=H ( 3 , 2 ) -L31* U 1 2 

U23= <H(2,31-L21*U13)*L22 

L33=l. /<H( 373)-U13»L31-U23»L32) 

U24= < H ( 2 , 4 ) -L2 1 *U 1 4 ) *L22 
U25=(H(2,5)-L21*U15)»L22 
L41=H(4, 1) 

L42=H(4,2)~L41*U12 

L43=H (4,3) -L4 1 *U 1 3-L42*U2 3 

LI34= ( H < 3, 4 ) -L31*U14-L32*U24 ) *L33 

L44=l./(H(4,4)-U14*L41-U24ttL42-U34*L43) 

U35=(H(3,5)-L31*U15-L32*U25)*L33 

L51=N(5, 1) 

L52=H(5,2)“L51*U12 

L53=H(5,3>-L51*U13-L52*U23 

L54=H(5,4)-L51*U1 4-L52*U24-L 53»U34 

U45=(H(4,5)-L41*IJ1 5-L42«-U25-L43#U35 ) *L44 

L55=l . / ( H ( 5, 5) -L51*U15-L52*U25-L53*U35-L54*U45 ) 

COMPUTE LITTLE R!S 

D1=L11*F(I, 1 ) 

D2=L22*(F( I,2)-L21*D1) 

D3=L 3 3*(F(I,3)-L31»D1 -L32*D2 ) 

D4=L44* ( F ( 1 , 4 ) -L4 1 *0 1 -L42'«-D2-L43*D3 ) 

D5=L55* ( F ( I , 5 ) -L51 *D1 -L52*D2-L53*D3-L54*D4 ) 

COMPUTE BIG R!S 

F ( I , 5 ) =D5 

F(I,4)=D4-U45»D5 

F ( 1 , 3 ) =D 3-U34#F (1,4) -U35^H'5 

F ( 1 , 2 > =D2-U23«-F (1,3) -U24*F (1,4) -U25*D5 

F ( 1 , 1 ) =D1-LI 12»F (1,2) -U13*F (1,3) -U14*F (1,4) -U1 5»D5 

ENDCD 


COMPUTE C PRIMES 


CODO M=1 ,5;K=2, KMAX-1 
D1=L11*C (I, 1,M) 

D2=L22* ( C ( I , 2, M ) -L21 #D1 ) 

D3=L33»(C(I,3,M)-L31*D1 -L32*D2 ) 

1;i4=L44«- ( C ( 1 , 4, M ) -L4 1 1 -L42*D2-L43#D3 ) 

D5=L55* ( C ( 1 , 5, M ) -L5 1 »D 1 -L52#D2-L53*D 3-L54*D4 ) 
B(I,5,M)=D5 
B( I,4,M)=D4-U45*D5 
B(I,3,M) = D3-U34*B(I,4,M)-U35»D5 
B(I,2,M) = D2-U23tB(I,3, M)-U24*B(I,4, M)-U25«-D5 
B(I,1,M) = Dl-U12*B( I,2,M)-LI13»B( I,3,M)-U14#B( I,4,M) 
1 -U15»D5 

ENDCD 


Figure 2-2. Proposed Recoding of Scalar 3-D (Code Taken From STEP (Cont.) 
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2300= 

13 

CONTINUE 

2310= 

c 


2320= 


I=JMAX-1 

2330= 

C 

COMPUTE E PRIME^t-BIG R FOR LAST ROW 

2340= 

C 


2350= • 


COnO N=1,55F=27KMAX-1 

2360= 


F(I,N)=F(I,N)-A(I,N, 1 )*F(I-1, 1 )-AU.N,2)#F(I-l,2) 

2370= 


1 -A(I,N,3)» F(I-1,3)-A(I.N,4)»F(I-1,4)-A(I,N,5)*F(I~1,5) 

23S0= 


ENDCD 

2330= 

C 


2400= 

c 

COMPUTE B PRIME 

2410= 

c 


2420= 


COnO N=1,5;M=1,5;K=2,KMAX-1 

2430= 


HtN.M)=B< I,N,M)-A< I,N, 1)*B(I-1> 1,M)-A(I.N,2)*B<I-1,2,M) 

2440= 


1 -A(I,N,3)*B(I-1.3,M)-A(I,N,4)*B(I--1,4,M)-A(I,N,5)*S<I-1,. 

2450= 


2 5, M) 

2460= 


ENDCD 

2470= 

c 



Figure 2 - 2 . Proposed Recoding of Scalar 3-D Code Taken From STEP (Cent.) 
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INSERT LUDEC AGAIN 

CODO K=2, Kmx-i raPKODUCIBILnY OF THE 

LI 1=1 , /H( 1 • 1 ) ob-iginal page is pooh 

L21=H(2, 1) 

U12=H(1,2)*L11 
L22=i./<H(2,2)-L2i*lJ12) 

U13=HC1, 3)*Lil 
U14=H(1,4)*L11 
U15=H< 

L31=H<5- 1 ) 

L32=H(3,2)-L51-»-(Jl2 
U23= ( H ( 2 . 3 ) -L2 1 *1.11 3 ) #L22 
L 33= 1 . / ( H ( 3 , 3 1 -U 1 3*L3 1 -U23*L32 ) 

U£4= ( H ( 2 , 4 ) -L2 1*U 1 4 ) *L22 
U25= ( H ( 2 , 5 ) -L2 1 *U 1 5 ) *L22 
L41=H(4, 1 ) 

L42=H (4,2) -L41 *U12 
L4 3=H (4,3) -L4 1 *U 1 3-L42*U23 
U34= ( H ( 3 , 4 ) -L3 1 *U 1 4-L 32*U24 ) *L33 
L44=l . / ( H ( 4, 4 ) -IJ14#L41-U24*L42-U34*L43) 

U35=(H( 3,S)-L31*U15-L32*U25)*L33 
L51=H(5, 1 > 

L52=H(5,2)-L51*U12 
L53=H (5, 3 ) -LSI *U1 3-L52*U23 
L54=H (5,4) -L51*U14-L52»U24-L53*LI34' 

U45= ( H ( 4, 5 ) -L41*Ui5-L42*lJ25-L4 3*1.1 35 ) *L44 
L55=l , / (H(5, 5)-L51*U15~L52*U25-L53*U35-L54*U45) 

COMPUTE LITTLE R!S 
D1=L11*F(I, 1 ) 

D2=L22*(F( I,2)-L21*D1 ) 

D3=L 3 3* ( F ( I , -3 ) -L3 1 *D1 -L32*D2 ) 

D4=L44*(F ( I,4)-L41*D1-L42*D2-L45*D3) 

D5=L55* ( F ( I , 5 ) -L5 1 *D 1 -LS2*D2-L53*D3-L54*D4 ) 

COMPUTE BIG R!3 
F(I,5)=D5 
F(I,4)=D4-IJ45*D5 
F ( 1 , 3 )=D3-LI34*F ( i , 4 ) -IJ35*D5 
F ( 1 , 2 ) =D2-IJ23*F (1,3) -U24*F (1,4) -U25*D5 
F( I, 1 )=D1-U12*F( I,2)-U1 3*F( I,3)-U14*F( I, 4>-U15*D5 
I=.JMAX-1 
1 = 1-1 

CODO N=1,5;K=2,KMAX-1 

F( I,N)=F( I,N)-F( I + l, 1 )*B( I,M, l.)-F( I + 1,2)*B( 1,N,2) 

1 -F(I+1, 3)*B(I,N )-F(H-l,4)*B(l,N,4)-F(I+l,5)*B(I,N,5) 
ENDCD 


IF ( I.GT.2)GOT0200 


2'=>70= 

c 

2'^:30= 

CODO J=. 

2‘^'?0= 

S(1 ,L, 1 

3000= 

•3(K,L,2 

3010= 

S(K,L,3 

3020= 

sa.,L,4 

3050= 

S ( 1= , L , 5 

3040= 

ENDCD 

3050= 

C 

3060= 

20 CONTINUE 

3070= 

c 


Figure 2-2. Proposed Recoding of Scalar 3-D Code Taken From STEP (Cont.) 
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DIMENSION 0( 100, 100,6,100) , X( 100, 100, 100) , Y( 100, 100, 100) , 

1 ZdOO, 100, 100 ),D( 100,5,5) , A <100, 5,5) ,B< 100,5,5) , 

2 C( 100, 5,5) ,S( 100, 100,5, 100) ,F ( 100, 5) , H(5, 5 ) 

DO 20 L=2,LMAX-1 


RLOADI L,2 STARTING VALUE 

SUBX LMAX,0NE,?<S0001 


CODO J=l, jmax;k=2,kmax 

MPYX KMAX, JMAX,g«S0002 FORM VECTOR LENGTH MAXIMUM 


RJ=Q(K,L,6, J) 


PACK KMAX,8<S«DQ1,S'S0003 
PACK SsS0002 , &&DSP , &0003 
SHI FT I S^S0001,6,S«S0004 
ADDX S(?<DSP,S'S0004,&&DSP 
MPX S<&DQ2, SIX, S'S0004 
MAP, F=GATHR, R1=&S0003, R: 


S/S<D01 CONTAINS DIMENSIONS OF Q 
FORM -TEMP VECTOR AT DYNAMIC SPACE 
ITEM COUNT CONVERT TO BIT ADDRESS 
UPDATE DYNAMIC SPACE POINTER 
COMPUTE SKIP DISTANCE IN Q 
:?iS0004, Wl=?<0003. 


XK=(X<K+i,L, J)-X(K-1,L, J) )-H-DY2 
YK= ( Y (K+1 , L, J ) -Y ( K-1 , L, J ) ) *DY2 
ZK=(Z(K+1,L, J)-2<K-1,L, J) )*DY2 

PACK 2<fcXL,&&DSP,&D0003 
SHIFTI Si&XL, 6, &S0005 
ADDX &S0005,&&DSP,&&DSP UPDATE DSP 
PACK HUNDRD,&DX,8cD0004 RECORD LENGTH 

MAP, F=GATHR,R1=8<D0004, R3=TNTHSND, W1=?,D0003.' 

PACK «^&YL, &?^DSP, StDOOOS 

SHIFTI &&YL,6,&S0005 

ADDX 8<S0005,?«8<DSP,8£8(DSP UPDATE DSP 

MAP, F=6ATHR, R1=?<D0005, R3=TNTHSND, W1=«£D0005 

PACK &&ZL , &8/.DSP , 8£D0006 
SHIFTI &&ZL,6,&S0005 

ADDX ?/S0005,58<DSP,?c&DSP UPDATE DSP 

MAP,F=GATHR,R1=D0006,R3=TNTHSND,W1=8<D0006 

MAP, R2=8<DDY2, D=BCAST. 

BUFF,WB1=S2,E=S,F=000. ' SETUP BROADCAST OF DY2 IN BUFFER 

(Continued) 


Figure 2-3. Hand-Compiled Example of a Segment of FORTRAN Code 
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im^uoiBiLrrY of the 
I^age is poor^ 


PACK' TNTHSND, &DXK, S<DXK 
PACK TNTHSND, !«DYK, ?'DYK 
PACK TNTHSND-,&DZK,SfDZK ' 
ADD X S I XFOUR , S<D0003 , &D0007 
SUBX &DC)0037 TWCiWDS,2<D000S 
PACK TNTHSND , S<D0007 , ?<D0007 
PACK TNTHSND , ?<DOOOS , S/DOOOS 

ADDX SIXFOUR,&D0005,S<D0009 
SUBX ?jD00057TWOWDS7&Dt3010 
PACK TNTHSND, &D00097&DOOO^ 
PACK TNTHSND, 8tD0010,?tD0010 

ADDX SI XFOUR, &D0006, ?<D001 1 
SUBX ?{D0006,TWOWDS,&D0012 
PACK TNTHSND, SiDOOl 1 , 2<DOO 1 1 
PACH- TNTHSND, S:D0012,S«D0012 


MAP, R1=S/D0007, R2=8<D0008, W1=8<DXK=VU. 

BUF, A=RBl,B=l,E=O,F-000. BROADCAST FROM BUFFER 
VEC, F= ( A-B ) -H-D , A=S 1-, B=S2 , D=RB 1 , W 1 =AR 1 . 


MAP , R 1 =&D0009 , R2=?«D00 1 0 , W 1 =?<DYK=VU . 

BUF,A=RB1,-B=i-,E=0-,F=000. BROADCAST DY2 FROM BUFFER 
VEC , F= < A-B ) -H-D , A=S 1 , B=S2 , D=RB 1 , W 1 =AR 1 . 

MAP, Rl=SiD001 1,R2=8(D0012, W1=S«DZK=VU. 

BUF, A=RB1,B=1,E=0,F=000. BROADCAST DY2 FROM BUFFER 

VEC , F= ( A-B ) *D , A=S 1 , B=S2 , D=RB 1 , W 1 =AR 1 . 

XL=(X(K,L+1, J)-X(K,L-1, J) )*DZ2 
YL=(Y(K',L+1, J)-X(K,L-i, J> >*DZ2 
ZL=(Z(K,L+1, J)-Z(K,L-1, J) )*DZ2 ' 


Figure 2-3. Hand-Compiled Example of a Segment of FORTRAN Code (Cont.) 
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Figure 2-4 provides some summary information about the ‘assembly language’ form 
in which the FMP code is presented, as compQed by a ‘pseudo compiler’. Some of 
the conventions such as using a special character to delineate compiler-generated 
variables, descriptors and arrays has been taken from the STAR FORTRAN compiler 
scheme. A brief description of the code follows. 

The .operation RLOADI stands for the scalar function LOAD .REGISTER with 
IMMEDIATE ddtal The value '2' will appear as part of the actual instruction. The 
register called L will be deHned as a permanent register out of the 256 available to 
the programmer in the Scalar Processor. 

SUBX is the operation SUBTRACT INDEX (or address). The register called ONE 
is canonically defined as register 16 in all scalar units of the STAR family. The 
temporary register &S0001 is set up to be used at the DO loop termination sequence 
(not shown in Figure 2-3). 

MPYX stands for Multiply Index value (non floating point). The result will be the 
vector length to be processed for the metric arrays X, Y and Z. The CODO statement 
permits the compiler to generate a series of GATHER RECORD operations to form a 
long vector which makes the vector arithmetic more efficient. Figure 2-5 ^es a 
visualization of the matrix as it would be stored in memory if the dimensions of 
X, Y, Z and Q were (10,10,10) and LMAX, KMAX and JMAX were each 10. The 
numbers in the blocks indicate their sequential storage addresses. Thus Q(l,l,l) would be 
block 00, Q(2,l,l) would be block 01, Q(l,10,l) would be block 90, and Q(1,I,2) would 
be block 100. 

The CODO statement creates a vector operation that, for each J, removes from memory 
a vector of length KMAX-2. This action will result in a new vector consisting (referring 
to Figure 2-5) of block 00 through 09 followed by blocks 100 through 109, blocks 200 
throu^ 209 and so on up to block 909. The GATHER RECORD operation makes a 
random reference to memory for the first element addressed (J=l,2,3 . . . JMAX) and 
then retrieves the data following (K=23 • • • KMAX) at ‘near-pipeline rates’ of eight 
64-bit operands per minor cycle. The columns of data gathered in this manner are 
stored sequentially in memory. 

A series of scalar instructions preceding the GATHER instruction forms the descriptors 
to be used by the Vector and Map Units. The instruction PACK merges the rightmost 
sixteen bits of a register into the sixteen-bit length field of another register (which normally 
contains the array base address) and places the result in another register (in -Ifais case 
temporary, compiler-assigned descriptors). 

Temporary vectors (which wiU never be transmitted to Backing Store) are assigned 
dynamically in the same matmer as used on the STAR-100 computers. A fixed register, 
called the Dynamic Space Pointer (DSP) contains the address of the first available 
location in free (unused) memory. Temporary vectors are allocated and deallocated in 
this region of Main Memory by using the DSP, and then updating the DSP to the next 
free location. 

The operation SHIFTI (shift register immediate) shifts the field length (which is an item 
coxmt of 64-bit words) left six places to form the bit address with which the DSP can 
be updated. 

The MAP instruction sets up the READ 1 trunk with the base address of Q and a 
record length of KMAX (number of words in each colunm or record), and the READ 3 
trunk with the increment used to proceed through memory for every J (refer to previous 
discussion of Figure 2-5 where J=l,block=00yl=2,block=100; then the memory increment 
would be 100). The WRITE 1 bus, Wl, is set up with the address of the temporary 
vector, which will later be assigned to the variable RJ. The function code GATHR 
indicates a GATHER operation. The presence of a field length in the READ 1 setup 
Indicates that the operation is to to be a GATHER RECORD. 
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9 Scalar code uses mnemonics identical to those implemented for the STAR-100 
Computer family 

• Form for vector machine code is; 

Unit name (Vector, Map, Buffer, or Swap, or the first letter of those 
names . . V, M, B, S) 

Subfunction field name (R1 . . means READ 1 setup) 

An = sign followed by the value for that field 

• All internal scalars created by the compiler are given sequential names beginning 
with ‘&S’. Thus the first s<^ar temporary created by the compiler would be 
&S0001. 

• All internal vector temporaries created by the compiler are given sequential names 
beginning with ‘&V’. Thus the first vector temporary would be named &V0001. 

• Descriptors (register file pointers to vectors and arrays) are assigned internal names 
which begin with ‘&D’ and followed by the array name. Thus a vector declared 
by the programmer as in a DIMENSION statement; 

DIMENSION AAA(IOO) 

would have a descriptor assigned to it with the name &DAAA. Likewise an inter-', 
nal vector temporary created by the compiler with the name &V0001 would have 
a descriptor assigned with the name &D0001. 

• An example of the form of the language, a memory-to-memory vector addition 
operation, is given as; 

MAPJR1=&DAAA,R2=&DBBB,W1=&DCCC =VU. 

VECTOR, F=A+B,A=RB1,B=RB2,W1=AR1. 

The READ 1, READ 2, WRITE 1 setups take their base addresses and vector 
lengths from the register file-contained descriptors &DAAA (for array AAA), 

&DBBB (for array B6B), and &DCCC (for array CCC). The WRITE 1 setup 
statement includes the expression =VU, which indicates the WRITE 1 data is 
to come from the Vector Unit (VU). 

The Vector Unit instruction indicates a function code of a simple add (see 
3.2.1.160 of the functional specification for complete list of codes and their 
representation). The A operand stream will come from SI (Source 1 from the 
Map Unit) and the 6 operand stream will come from S2 (Source 2 from the 
Map Unit). The WRITE 1 bus output will come from the ARl trunk (Arith- 
metic Result 1). 


tlPRODUCIBlLITY OP THE 
®IGINAL PAGE IS POOR 


Figure 2-4. Explanation of Machine Language Coding 
for FMP Produced by FORTRAN Compiler 


2-19 



J 

-► L 


▼ 

K 



Figure 2-5. Storage Allocation for Flow Variables in Main Memory Allocation 
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At the conclusion of the MAP operation, a vector of length 10,000 (if JMAX,KMAX and 
LMAX=100) will be formed in memory and assigned the descriptor &D0003, which will be 
changed (by code not shown) to &DRJ. 

The machine coding which follows basically repeats the descriptor setup sequence just 
depicted for the elements of Q, but for the metric array elements X, Y, Z. After forming 
die temporary vector &D0006, the compiler generates the MAP function and BUFFER 
function which together use the READ 2 stream (R2) to transfer the quantity DY2 to the 
Buffer Unit as a broadcast operand. Since the GATHER operations use only READ 1 and 
WRTTE 1, this operation can proceed concurrently with the MAP operation preceding it. 

The setup of temporary descriptors &D0007, &D0008, &D0009, &D0010, &D0011, &.D0012 
essentially offsets the starting addresses of the X, Y and Z vectors that have been gathered 
previously by + and - one word, (adding 64 to a bit base address is the same as offsetting 
the address by one word). 

The MAP, BUF and VEC instruction sets that appear at the end of Figure 2-3 accomplish 
the vector subtraction of the near-adjacent elements (K+1, K-1) of vector X, Y, and Z. 

The result of the subtraction is multiplied by the broadcast value of DY2 which has been 
preloaded into the buffer, producing two floating-point operations per pair of input X 
operands. In 32-bit mode this process, would yield 32 floating-point operations per minor 
cycle for a computation rate of 3.2 gigaflops. Note that this operation is memory-to-memory 
since the vectors are too long for the buffer (except for the broadcast quantity DY2). 

The next sequence of code would be the formation of the XL, YL and ZL components. 

A ‘dumb’ compiler would produce hvo GATHER RECORD operations for each metric 
array (X(K,L-I-1J) and X(K,L-1,J) instead of remembering that the X(K,L,J) gathered for 
the XK, YK and ZK metrics computations could be retained in memory and used in 
the next step of L to provide the 1^1 elements. 

The amount of ‘smarts’ necessary to accomplish the retention of previously gathered vectors 
in X, Y or Z is no different tlian the intelligence needed to retain the counterpart scalar 
values between one computation or iteration in a DO loop. 

E. Examination of the implicit code in light of memory accesses required revealed that two 
distinct approaches were dictated. First the assumption that the entire code could be 
performed in 32-bit mode gave hope that the total data base and all temporary vectors 
could be held resident in Main Memory. Thus it was necessary to determine first whether 
this was true, given the explosion of temporaries, when long vectors are created for efficiency 
reasons. Secondly, the potential need for 64-bit accuracy in the calculations made it obvious 
that a 64-bit version could not be held in Main Memory; thus access patterns for the 
Backing Store had to be examined. Finally, it had to be determined if the calculations them- 
selves could be done in the required time given for desired FMP responses to customers. 

First, consider the 32-bit case. Using the example of left-hand-side coding given in Figure 2-2, 
the basic memory requirements are: 

1. Flow variables Q(1 00,1 00,6, 100)=6,000,000 

2. Metrics X(1 00,100,1 00), Y(100,100,100),Z(100,100,100>=3,000,000 

3. Update matrix S(1 00,1 00,5,1 00)=S,000,000 

In 32-bit mode this would require 7,000,000 64-bit words of the 8,000,000 proposed for 
the FMP. 

To make the GATHER and subsequent vector computations more efficient by using long 
vectors, it is desirable to gather planes in the J direction for this segment of the code. 

Thus there would be resident in Main Memory at any one time in left-hand-side solution, 
a number of planes of data, each 10,000 32-bit words in length (5000 64-bit words): 
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1. Six planes of Q flow variables —30,000 words. 

2. Three planes of metric data (to keep the L,L+1, and L-1 planes available 
for the next step in L), each in the X, Y and Z directions for a total of 
nine planes — 45,000 words. 

3. A plane’s worth of A, B, C and D data with 25 elements for each point 
in the plane (five by five block) — 500,000 words, 

4. A plane of the update array S, in gathered form —25,000 words. 

5. A plane each for the H and F intermediate matrices used in the BTRI 
sequences (5 by 5) —375,000 words. 

The grant total of large data temporaries is then 975,000 words plus the 7 million words 
for major variables equals 7,975,000 words.- An 8 million (nominal) word FMP actually 
contains 8,388,608 words of memory. The operating system requires 65,536 thus leaving 
8,323,072 words for useful storage. The remainder after allocating known temporaries 
and flow variables is then 8,323,072 - 7,975,000 = 348,072 words. Since the BTRI 
sequences vvill use the vector buffers for intermediate storage, this remainder seems adequate 
at this time to hold a 32-bit version of the three-dimensional implicit code, in the form 
available for this study. 

It is obvious from this example that a 64-bit version would substantially overflow the Main 
Memory capacity. If the major portion of the data base for the 64-bit version must reside 
in Backing Store (flow variables and metrics plus the S array) then the capability must exist 
to transfer the data required by the calculations at a sufficiently high rate to match the 
computation rate, in order to achieve the performance objectives of the FMP. The scheme 
proposed by NASA Ames personnel consists of storing data in the Backing Store matrix in 
a manner different from that used in the 32-bit mode. In this 64-bit mode case, a basic 
transfer block of 32,768 for the Backing Store has packed in it all of the variables needed 
for a given point in the mesh. Thus the six Q values and three metric (X,Y,Z) values 
would be stored in a single physical block. The vector lengths for each of the nine vectors 
an the block would be 32,768/9=3640 elements maximum, which is efficient for the Vector 
Unit and Map Unit to process memory-to-memory or within the vector buffer (which is 
8192 words long). 

If the dimensions of the mesh are 100,100,100 (referring again to the storage scheme of 
Figure 2-5) a non-integral number of columns of data from the mesh will reside in a block. 
Since the intent is to move data from the Backing Store in ‘slabs’ (see Figure 2-6), it would 
be better to always allocate integral rows -and columns to physical storage blocks. Thus for 
a 100,100,100 mesh, 30 columns of each of the major variables would be stored in each 
physical block. This means that Q(1 ,1,1,1) through Q(99,3,l,l) would be stored contiguously, 
followed immediately by Q(1 ,1,2,1) through Q(993,2,l). The last flow variable in the 
physical block Q(99,3,6,l) would be followed by the first metric X(l,l,l). 

The computational sweep in the J direction would then require the transmission of the first 
physical block, the nth physical block, the 2nth block, and so on imtil a ‘slab’ in the J 
direction is completely transferred to Main Memory. Figure 2-6 shows such a slab for a 
mesh of 10,10,10, for a single variable. The physical blocks in this case contain the 
following: 

Physical Block 1 = blocks 00-29 

Physical Block 2 = blocks 30-59 

Physical Block 3 = blocks 60-89 

Physical Block 4 = blocks 90-99 

Physical Block 5 = blocks 100-129 
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Figure 2-6. Slab Slicing of Matrices for Backing Store Transfers 
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Proceeding to input a slab in the J direction for Figure 2-6 then would transfer 
physical block 1,6,11,16 and so on. 

Note two effects of this storage system; 

1. A certain percentage of the physical block transfers contain no useful data. 

Thus the transfer efficiency is affected by that amount. 

2. The maximum length vectors without performing a GATHER operation are 
an integral number of columns in length, and the GATHER operation in any 
direction is now aperiodic since the data for J=l,2,3 is not stored at regular 
memory intervals because of the ‘holes’ left in some physical blocks, and 
matrices are no longer homogenous (elements of other matrices are stored 
sequentially imbedded in other matrices in actual physical storage). 

The programming of this technique utilizing BUFFER IN/BUFFER OUT and CODO 
constructs (see Section 4) remains as a great challenge to enliven the next phase of 
this study. 

A slab in the J direction would require the transfer of one hundred physical blocks 
(or ten blocks if the dimensions were as in Figure 2-6) between Main Memory and 
Backing Storage. This would mean the allocation of 100'*'32,768 or 3,276,800 words 
of Main Memory for buffering of the slab. In the J direction, computation (and the 
necessary SCATTER/GATHER operations) cannot proceed until all data is in place in 
Main Memory. To keep the computational rate up, a slab must be moved' into a 
buffer while' calculations are being performed on a different slab in another buffer. 

This would then require 2*3,276,800=6,553,600 words of Main Memory, with no room 
to hold the S matrix. Thus the S matrix may have to be combined with the Q and 
X, Y, Z matrices in physical blocks. It appears that sufficient Main Memory capacity 
exists to support this scheme, however. 

If this allocation and transfer scheme is feasible then the transfer rates must be 
investigated. The method employed by this ‘slab’ technique implies a full transfer of 
all mesh, metric, and update matrix variable into and out of Main Memory during the 
J direction sweep. It tten appears possible to faring in physical blocks constituting the 
full plane at J=1 and to solve both sets of equations (K,L sweep) while the plane 
remains in Main Memory. Referring to the example in Figure 2-6, this would mean 
holding blocks 00 through 99 in Main Memory for the solution in the columnar and 
row-wise directions. The amount of data transferred would be one full transfer of all 
variables for a sin^e K,L sweep of the mesh. Total data would then be 14 million 
words (9 million for flow variables and metrics and 5 million for S update array). 

If aD the metric, mesh and update data is intermingled in physical blocks, then all 
data must be moved both ways (to and from the Backing Store) even though many 
variables .such as the X, Y, Z metrics are not updated, and thus would not otherwise 
find their way back to Backing Store. Taking the two sets of transfers then, 14 million 
words would be transferred to Main Memory for the J sweep, 14 million for the K,L 
sweep, and the entire 14 million transferred back at the end of that particular processing. 
Four transfers (counting both directions)’ of 14 million words equals 56,000,000 words 
per time step. If 256 time steps is a representative example then 1.434*10^” words 
would be transmitted for a single problem solution taking about eight minutes. 

This volume of data transfer would require 30,000,000 words per second to be moved 
at sustained rates. This becomes .3 words per clock cycle. The Backing Store is capable 
of prordding .2 words per clock cycle with the present design at a sustained rate. The 
conclusion to be drawn is that a different physical block allocation scheme should be 
devised which reduces the total data transferred by at least 10 percent, since the volumetric 
efficiency of the physical block storage (because of the holes left in the blocks) is not 
100 percent, and the transfer rate is slower than desired. Alternative transfer rates are 
possible with the Backing Store, but should be decided on quickly since they affect the 
major structure of the Backing Store design. 
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With the limitations just discussed it appears that the memory access patterns for 
Backing Store transfers are sustainable for the known 3-D code. The 32-bit mode 
version exhibits similar access difficulties, but recent analysis (see results of simulation 
in this section) indicates that the Map Unit can support the computational rate 
required under worst-case conditions — in 32-bit mode. 


CONCLUSIONS OF ANALYSIS IN THIS INTERIM PHASE 

A great deal is being learned about the behavior of the implicit three-dimensional code when programmed 
for the FMP. It appears that more analysis must be done involving the tradeoffs between cost, performance, 
size and bandwidth of the memory systems when concentrating on the implicit code. For example, it is not 
altogether clear that a much lower-cost, higher-capacity Main Memory would not be more desirable than the 
system presently proposed for the FMP. Programmability and compiler optimization are severely impacted by 
the need for clever slicing schemes to move data between Backing Store and Main Memory. 

The GPSS model that is being developed may provide the necessary tool for trying different forms of Main 
Memory and Backing Store. The next phase will provide the opportunity to hand-compile the balance of 
the key segments to determine the best memory approaches. 

The major analytical effort in this phase ended up focusing on storage capacity and bandwidth, and to a 
minor degree on the capability of performing non-sequential access to memory for the J sweeps. These 
have emerged as first-order effects on the performance of the FMP, In the next study phase, the second- 
order effects must also be analyzed more closely to determine if the pipelines and Map Unit can be over- 
lapped efficiently (and without great programming difficulty), and whether they are properly matched to the 
problem. 


FUNCTIONAL DESIGN 

The Phase 1 Extension effort produced a proposed structure for the FMP and the overall system in which it 
is to be imbedded. In this report all hardware design has been focused on the FMP and in particular those 
parts which are newly designed and not borrowed to some extent from the STAR-100 family. Thus con- 
siderable effort has been applied to the Vector and Map Units which are the key to the arithmetic bandwidth 
of the FMP. The description of the resulting design may be found in hardware specification form in 
Appendices A and B of this report. The Instruction Specification gives the summary of how the FMP 
performs each operation, whilst the Functional Specification illuminates such material as the behavior of the 
particular arithmetic system chosen, transfer and clock rates, and maintenance interface data. 
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■BLOCK DIAGRAMS 


The revised block diagrams of the FMP can be found in the Functional -Specification. More detailed 
design work has been done on the Map Unit, but the level of block description given in the specification 
represents that in which confidence can be placed at this time. Formal specifications for the Programmable 
Device Controller are currently under way, with completion due by sununer 1978. Detailed specifications 
for equipments already in place in standard systems, such as the high performance disks, have been omitted. 
An unscheduled and extraordinary amount of the total project effort has been expended on the analysis 
and redesign of the FMP and its instructions. This factor led to other portions of this study being reduced 
in scope over what was originally planned. This was essential, since the programming and timing estimates 
and the block simulation efforts had to await the creation of a workable engineering design of the new 
components. 


Comparison With/Difference From Phase 1 Design 

Examination of the block diagrams and description in the Functional Specification wiU reveal several major 
changes in design since the release of the Phase 1 Final Report. 

The most significant change is the more detailed structure of the Vector Unit, wherein there are three sets 
of identical functional elements (two front-end adders, two multipliers and two back-end adders) and three 
checking elements instead of the two called out in the Phase 1 plan. Further, the fully general interconnection 
scheme of Phase 1, wherein any element could be connected to any other, has been reduced to a more 
practical (from an engmeering and parts count point of view) set of interconnections. This constraint then 
led to the definition of explicit interconnections called out by the allowable arithmetic instructions. The 
result of this redesign is a significant reduction in hardware, and a substantial increase in the checking of 
results. This is due to the fact that although there are six separate arithmetic elements, the front-end adder 
is not a full floating-point adder, but contains only the prenormalize networks needed for initiating a 
floating-point addition, and the back-end adder has only the post-normaiize network, and is shared by the 
multiply element for forming the final sum of all partial sums and carries generated by the multiply element. 
Although this back-end adder is shared by the multiply element, an auxiliary port (which required very little 
hardware) has been provided to bring in another operand to be added to the product. Checking probabilities 
are enhanced since at any one clock cycle two of the six elements will be idle (and thus possibly checking 
their partners) because of the constrained instruction set riiat permits a maximum of three floating-point 
operations to be called out at a time (for example A+(C*D) leaves one multiplier and one back-end adder 
free). 

SECDED is now carried within the Vector Unit on all trunks not imbedded in arithmetic elements, rather 
than parity bits. 

The vector buffer, while still physically contained within a Vector Unit, is programmed separately with its 
own instruction (9E). 

SECDED is now carried within the Buffer Unit, instead of simple parity. 

The Buffer Unit has only two READ ports instead of four. This reduces the hardware parts count and 
also the risk of not being able to get fast enough RAMs for this unit by 1980. 
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Nine Vector Units (vector pipelines) form an arithmetic ensemble instead of eight, as described in the 
Phase 1 report. A simple method for providing ports for the nine units in the Map Unit, and a method 
for switching one pipeline off-line and another on-line, made it possible to provide for quick recovery in 
the event of a pipeline failure. 

All bit addressing has been eliminated from the Map Unit. All bit strings used for control vectors and 
order vectors must begin on a 32-bit boundary. Alignment of bits strings to these boundaries can be 
done by the Scalar Processor with its copious idle time. To facilitate string logicals and alignment, the 
Scalar Processor has added to it two double-length shift instructions (operation codes 20 and 21 hex) which 
are not available on the STAR-100. 

The Swap Unit has been given a Backing Store inap table to enable setting regions busy for monitor 
purposes, or to permit explicit input/output in a limited form to be performed by the job mode program. 

Instruction Specification 

The instruction specification may be found in appendix A of this report. This specification gives the 
behavior of all FMP instructions. To permit using portions of the actual STAR-100 design and its docu- 
mentation system, the FMP Instruction Specification was designed to not overlap the STAR-100 features, 
but to appear to be a mutually exclusive functional entity. Thus instead of changing an existing instruc- 
tion (such as Vector Add Upper, op code 80) to become the Swap Unit instruction, the FMP defines 
such an instruction as illegal, and uses one of the STAR-100 Ulegai instructions as the SWAP instruction 
(56 op code). The pmpose in this is to permit future expansion of the FMP instruction set to embody 
desirable STAR instructions, but more importantly to open the avenue of STAR-100 simulation of the 
FMP and vice-versa, since all instruction decode and control on both machines is done with microcode. 

Functional Specification 

The Functional Specification may be found in appendix B of this report. Certain functions have not yet 
been defined, and are indicated by the phrase “to be defined”, or “designed at a later time”, or some 
similar disclaimer. Certain other functions which correspond to their STAR- 100 counterparts have not yet 
been designed, but the STAR-1 OOA feature is described to give the “flavor” of the function. An example 
would be the description of microcode loading and diagnostic control for the FMP which is, as an interim, 
taken directly from the STAR-1 OOA Functional Specification. 

Rationale for Design Approaches 

After a year of consideration, analysis, design and redesign, it is felt that the FMP structure given in the 
included specifications represents a reasonable engineering approach to providing computations in excess of 
one billion per second. As the design undergoes more detailed study some changes are made, and in other 
cases convictions grow stronger regarding the approaches taken. Several items that were examined following 
the release of the Phase 1 report, and in response to commentary on that report are presented here. 
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Lo^c Family 


A second-generation logic family of high-speed ECL LSI was proposed as a means for keeping the “real 
estate” and the parts count for the FMP down to a reasonable level, not to mention the reduction of 
power and cooling requirements. As was pointed out, this choice of a non-existent but promising tech- 
nology was the leading technological ri^ for the project. RADL personnel still feel that pursuit of this 
goal of a denser LSI parts (LSI-II is important to the project but not essential to meeting the stated goals 
of performance and reliability). This can be achieved by continuing the reduction in complexity of the 
FMP hardware as understanding of its behavior with the actual mathematical codes improves. Efforts 
are underway to carefully analyze the design and construction of the FMP with existing LSI chips and 
packaging. 

On the other hand, the desirability of a cooler, smaller, faster LSI ^stem for the FMP has motivated the 
pushing of semiconductor manufacturers to pursue the next-generation technology. For the purposes of 
this study, all estimates of space, power, and speed are based on the currently pursued parameters for this 
new generation. In future reports such quantities will be stated in terms of construction with either family 
of logic. 


Parts Count vs Performance Tradeoffs 

One of the major engineering concerns regarding the buildability of a machine as powerful as the FMP is 
the sheer numbers of parts, and hand-tooled intercoiuiections as they affect reliability and maintainability. 
The nine pipelines proposed for the FMP are felt to be the limit of the number of parallel functional 
imits that should be assembled in one place with the existing technologies. At present, these pipelines can 
produce 4800-million 32-bit floating-point results per second (at a 10 nanosecond clock cycle), peak rate. 

If studies .indicate that 32-bit mode ojily could be applied to, codes using the FMP, and if code analysis 
proves that the FMP could sustain computations at 50 percent of peak rate, then half as many pipelines 
would be needed (actually 5 instead of 9) with a consequent major reduction in hardware parts and a 
commensurate improvement in reliability. 

Pursuant to this parts reduction, the original generalized Vector Unit, with four read ports from the Buffer 
Unit and total interconnectability, created larger parts counts per unit than was felt desirable. Based on 
an analysis of the 3-D implicit code, it was discovered that a 20 percent reduction in parts in those areas 
would affect the performance of the FMP on that code (counting only vectorized processes) by less than 
3 percent (thus yielding an overall affect much less than 3 percent). It is in areas such as these that 
continued analysis of the codes, and their match with the hardware, is beginning to pay off. 

Choice of Instructions 

The basic instruction set for the FMP was -derived from the STAR-IOOA. This provides leverage on the 
generation of diagnostics and utilities which can be executed in the Scalar Processor alone. Thus when the 
FMP is first powered up and in checkout, the wealth of software and checkout experience gained from 
the STAR-IOOA project can be applied to the FMP. With a completely checked out Scalar Processor and 
input/output system, checkout and maintenance of the remaining units can be greatly facilitated. 
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Instruction extensions to the Scalar Processor were minimized to reduce the need for new scalar diagnostic 
development, and to limit somewhat the functions accessible by the monitor operating system. The two 
monitor instructions for setting up the Swap Unit, Map Unit apd for communicating with the PDC were 
considered the basic minimum necessary to support the operating system. A vector SWAP counterpart to 
the STAR-100 regster file SWAP, then was included for Backing Store interchange. Finally, by limiting 
the actual VECTOR and MAP instructions to three, the remainder of the STAR instruction set could be 
made illegal or legal depending on needs of the software implementors (for simulation purposes) without 
supplanting from the instruction set the imique FMP instructions. Note that the 9D and 9E are variable 
length instructions depending upon how many individual streams are being set up, while the 9F is a fixed 
length vector instruction executed fay the Vector Unit. In effect, eadr of the FMP vector instructions 
(Map, Buffer and Arithmetic) are somewhat microcoded versions of the original STAR vector operations. 

This ability to “build your own vector” is useful when generating object code for computations such as 
the Navier-Stokes solutions. 

The choice of a microcoded level of instruction also somewhat simplifies the en^eering since, in many 
instances, a single bit in an FMP vector instruction controls a single gate or single fanout in the processor 
without the need for extensive decode and timing controls. 

Extensibility 

As noted above, at the discretion of the system developers, other STAR-type instructions could be invoked, 
either to be implemented by simulation or by additional hardware as the particular case warrants. Within 
each FMP instruction extra room has been left for extensions to be defined later. For example, all 
address fields are larger than the maximum allowable memory space requires today. Thus, although 8-miilion 
words of bipolar RAM appears to be the practical limit at this time for main memory construction, 
addressing has been retained for 32-milllon words. Likewise, the Backing Store addressing permits addressing 
1-bilIion words of data, while only 256-million words seems practical or necessary at this time. 

Finally, a number of instnictioiis have been left undefined for both the STAR-1 OOA and the FMP, both 
in the 32-bit scalar class and in the 64-bit vector class. As a function comes to light that would be 
desirable for STAR, a similar function might be included in the FMP. 

BLOCK-LEVEL SIMULATION 

One of the methods considered best for measuring the behavior of a particular design is to develop a 
computerized model of that design, submit to that model characteristic code sequences, and extract the 
predicted execution of those code sequences in the face of conflicts for resources such as memory, 
functional units, and input/output buses. To provide NASA with a tool which can be used to measure 
various kinds of computations on the FMP, a simulation model could be built which could then be run 
by NASA personnel at their initiative on whatever computers were available. Thus, various analytical 
teams could examine machine performance in their areas of interest. 

The purpose of this task was to supply a package containing the necessary materials to permit Ames 
researchers to create input decks of programs that might be run on an FMP and then run these programs 
through a model (provided as part of the package). The resultant data could provide timing information, 
location of significant bottlenecks, and storage access patterns for review of the Backing Store and 
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input/output strategies. At the time of this writing the package is not yet in a form which can be used 
by NASA, but it should be ready by the time of submission of the final draft of this report. 

Simulation With GPSS vs LSISYS 

The large-scale computer development operation of Control Data (STAR-lOO and FMP being examples) 
bases its design and construction cycle on a series of automated design and documentation tools, which 
utlimately produce magnetic tapes that are then used to automatically (or nearly so) fabricate the silicon 
chips and main dfcuit boards for the object computer systems. 

Three levels of simulation of a design provided in this system, called LSISYS (Large Scale Integration 
Simulation System), are General Block-Level Simulation, Detailed Block-Level Simulation, and Gate-Level 
Simulation. The first, and most general, of these consists of writing FORTRAN subroutines representing 
the behavior of a given block of logic (say the entire Swap Unit), and integrating them into the LSISYS 
main program. The second level, which can only be started when actual machine design is underway, 
consists of utilizing a library of basic logic blocks (for example, a basic 32-bit wide adder networic among 
others), selecting the desired building blocks, placing them on “pseudo-boards”, and interconnecting alt 
data trunks and controls as groups of wires. The third and most detailed level of simulation consists of 
placing actual arrays on actual boards (with software) and routing all real signals (by software). This final 
stage represents a complete verification of the design with the hardware as it will actually be built. 

Each level requires a certain amount of resource to prepare, with the least resource required by the highest, 
most coarse model for the machine and the most resources required by the gate-level simulation. Machine time 
requirements can be as high as 150 times greater for the gate-level simulation than for the same functions 
modeled at overall block level. The major advantage in using this LSISYS implementation is that any block 
or group of blocks may be replaced with their counterpart detailed block or gate models, and the entire 
assemblage 'run as one whole unit. Thus as design of various components proceeds at different speeds, the 
entire ensemble can be simulated to validate a particular gate design without requiring the remainder of 
the machine to be at the same level of gate design. The running time advantages of running in this mixed 
mode are also quite substantial over running the entire computer simulation at the gate level. 

The disadvantage of LSISYS is that one must be quite &miliar with tiie iimer structure of the simulator 

in order to write generalized block models which can be incorporated. Further, the intimate interconnection 
of simulator flags, variables, and parameters makes integration a lengthy affair of compilations involving the 
entire simulator. For more detailed design verification and analysis this disadvantage is outweighed by the 
amount of discrete information that can be obtained from the general level of simulation, while detailed 
block and gate' simulation are easier to submit to simulation in all cases. 

At the same time as RADL investigators began to discover the amount of resources needed to integrate a 
basic block model of the FMP into the LSISYS simulator, Ames personnel disclosed their most immediate 
needs for simulation data. It became obvious that the level of detail required at this point in the FMP 

project was even more superficial than planned for the LSISYS block model. In particular, general statistics 

about the performance of the Main Memory and Backing Store under different loading conditions is a key 
concern of both RADL and NASA. Such a general, statistical model could be provided by more readily 
available, standard simulation systems such as ASPOL, SIMULA, or GPSS, which have been in existence on 
a variety of computer systems for many years. 
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The peripheral disadvantage of using the LSISYS mechanism for this evaluation function by NASA analysts 
was that the system is not a standard, supported Control Data product, which is generally available. Instead, 
it is used and supported by the STAR Development Division, and can only run on a fully-configured 7600 
system. Choosing a general-purpose simulator such as GPSS could then make the basic block simulator 
system available on a more general das of computers. 

A brief examination of the simulators available to RADL via CYBER services and a survey of simulator 
experience among the project staff led RADL to select GPSS (General Purpose Simulation System) since it 
and a resident expert were readily available. Its availability on the general CYBER computers as well as 
IBM machines housed at Ames further justified the choice. 

It is expected that as design continues the GPSS model will be refined, and selected simulation analysis of 
throughput conducted in that system. As the detailed block-level design proceeds however, input will be 
prepared for LSISYS so that the actual hardware design can be verified for functional, as well as perform- 
ance, characteristics. 


Methodology for Simulation 

The FMP has been subdivided into, the components .of Main Memory, Backing Store and processing unit 
models, which are linked together. While the models for the memories are developed to represent as close 
as possible the actual hardware construction, the balance of the processor model consists of an instruction 
interpretation and decode, and execution timing segments for the swap, vector and map functions invoked 
by the decoded instructions. This elementary model does not process actual data, although vector addresses 
and lengths are examined from the input source code. 

The input to the model is a series of machine instructions, memory-contained descriptors for vectors, and 
initial register file contents. The simulator assumes that it is dealing solely with a job mode computational 
program and is able to time the execution of sequences of instructions, and to produce the memory access 
patterns resulting from various influences. While the model is running, input/output activity can be simulated 
by either random, or block sequential accesses. Output data is limited to execution time by instruction 
and memory activity at this time. 

The simulation package to be delivered under separate cover •with the final draft of this report will also 
include an itemized list of the regions of source input (to GPSS) where CYBER/IBM incompatibilities exist, 
and necessary corrective actions to be taken if the system is to be run on IBM equipment. 

After delivery of the simulator, and for the duration of any subsequent phases in which RADL is involved, 
design updates wifl be provided for the simulator. To this end some form of documentation and simulator 
update control system will be devised. 

Relationship to Future Simulation 

The GPSS level simulator Is intended primarily for the use of analyst teams outside RADL to evaluate the 
ability of the FMP to meet project objectives. It is not capable of being utilized as a verification that 
the specified hardware is capable of being built. That function still rests with LSISYS which is the primary 
design tool for the RADL FMP designers. Thus GPSS can be •viewed as a management tool, while LSISYS 
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is the designers tool. Since the input forms and even the amount of detail submitted to the systems 
differ widely, some form of controls will need to be introduced to ensure that the LSISYS model and 
the GPSS model actually represent the same FMP, for as long as GPSS is used by NASA persomiel to 
validate the FMP approach. Since LSISYS is capable of yielding the same data with a higher degree of 
refinement and more closely represents the hardware being designed, it is suggested that, despite its stated 
disadvantages, LSISYS be used as the continuing design review mechanism by NASA. 

In short, GPSS provides a well-4ocumented and expeditious means of getting some simulation results for 
NASA use, but LSISYS is the long-run solution to ensuring that the real design meets Aimes oTgectives. 

Results of Simulation 

Development of the hand coding of the implicit code, and the -GPSS model to reflect accurately the design 
of the FMP to date, has severely restricted the amount of data that could be obtained before the conclu- 
sion of this phase. It was decided to encode a small portion of the J sweep that is intensively ‘memory 
limited’, as data must be gathered for each J in the K,L planes. The computation of the metric differen- 
tiak and formation of the RHO variables into a long vector were hand coded for Figure 2-3. The results 
of performing 34 scalar operations, 3 GATHER operations and 3 vector arithmetic operations over a 100, 
100,100 mesh is the achievement of a 933-megaflop, computational rate. This sequence was chosen as the 
worst case, since memory is being accessed in the most inefficient way. 

The same computations in the L direction would achieve close to 3.2 billion floating-point operations per 
second, while those in the K direction would attain a rate of 1.8 billion floating-point operations per 
second.. 

A listing -of the GPSS output is too voluminous to include in this report and is being delivered to Ames 
under separate cover. This listing contains many statistical counts to illustrate the internal behavior and, 
in particular, the bottlenecks encountered by a given code sequence. 

In the next phase of this study effort, the complete left-hand-side calculation will be coded in extended 
FORTRAN, hand-compiled, and passed through the simulator. 
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Section 3 


FMP RELIABILITY ASSESSMENT 
INTRODUCTION 

The Phase I study for this project produced some reliability projections of a general nature as part of the 
Final Report. That phase did not include design beyond a conceptual stage and, therefore, reliability data, 
particularly for the FMP, was primarily subjective. One of the tasks for the Phase I contract extension was 
to carry the FMP design to a point which would permit making preliminary parts count estimates. These, 
in turn, could then be used for a reliability assessment with more credibility since it possesses a sounder 
base. 


METHODOLOGY 

Shice the FMP design proposed by Control Data is based on the STAR-IOOA, its technologies, and extensions 
of them, considerable actual detail exists and could be exploited. The basic technology, that of the 
STAR-IOOA, is ECL LSI-I, 168-gate array integrated circuits in S2-pin leadless carriers, packaged up to ISO 
per printed circuit board. 

The reliability analysis of the FMP was done by functional unit. Some of the units are anticipated to be 
the same as those of the STAR-IOOA (or very nearly so). For these units actual parts counts were used 
for printed circuit (PC) assemblies already designed. In addition, sufficient data was available to derine an 
average PC assembly (or model). The units to be developed uniquely for the FMP were then defined 
sufficiently to determine an estimated number of assemblies (or boards) per unit. The model was then 
applied to these board counts for a failure rate per functional unit. 

RELIABILITY ANALYSIS 
MODEL PC ASSEMBLY 

The model, or> typical PC assembly mentioned above, was derived from existing designs. The hems which 
contribute virtually all the failure mechanisms for this assembly are stated in Table 3-1 witii counts and 
failure rates. 


TABLE 3-1. MODEL PC ASSEMBLY 


Component Counts and Failure Rates 


Component 

Count 

Board Vlas 

18,500 

LSI Connectors 

8,468 

Solder Joints 

660 

Capacitors (ceramic) 

300 

Omega Resistors (buried in board) 

1,323 

LSI ICs 

147 


Expected Failure Rate (per million hours) 


0.00005 

0.0014 

0.0003 

0.014 

0,0004 

0.2 
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Bj' extending these counts and rates and summing the results, a failure rate of 0.0465 per thousand 
hours is obtained for the model, or average, board (assembly). 

REUABILITY PROJECTION BY FUNCTIONAL UNIT 

Ihe cmrent STAR-1 OOA Scalar Processor was used after deleting the associative memory portion. The STAR-1 OOA 
memory module was used but with substitution of an anticipated 4K memory chip for the present IK chip. This 
resulted in using 0.2 x 10'^ failure rate in place of 0.1 x 10"^ which is the rate for the IK chip. This module was 
also used for the input/output buffer. 

The existing design for the PDC and 50-Mbit data set have also been directly applied to the FMP. This is also true 
for the memory fanout, so for these three items, actual parts counts with their expected failure rates could be used. 

The remaining.units, or items, required estimation based on the level of design to date. These are the Map Unit, 
Memory Interchange, Vector Unit (9), Input/Output Distributor, and Backing Store. The Backing Store design is 
based on 65K CCD chips, 128 per board. The other units, or items, are new deagns and haye progressed to a reason.- 
able block-level. Board count estimates were made and the above model board was used as basis for reliability analysis. 
Table 3-2 lists all the above units, their estimated board (PC assembly) counts, and approximate chip (IC) counts/board. 


TABLE 3-2. BOARD COUNT BY UNIT 


Unit 

Estimated 
Board Count 

Approximate 

Chips/Board 

Scalar Processor 

14 

147 

Main Memory 

1216 

181 

Memory Interchange* 

4 

147 

Memory Fanout 

6 

112 

I/O Unit - PDC**(8) 

144 

ISO 

I/O Buffer 

38 

181 

I/O Distributor 

2 

147 

Map Unit 

6 

147 

Vector Unit***(9) 

72 

147 

Backing Store 

2100 

143 

*Includes Swap Unit 
**Includes Data Set 
***Includes Buffer Unit 







The Memory and Input/Outpnt Buffer consist of high-speed, bipolar chips packaged in a stack configuration; each 
stack is 32,768 39-bit words (includes SECDED). The above board counts are for 8 million 64-bit words (plus SECDED) 
of Main Memory and 54 million 64-bit words (plus SECDED) of Inpnt/Output Buffer. 

The PDC and data set comprise somewhat different technology in that the PDC consists primarily of TTL circuits and 
the data set is basically an analog device. In addition, these units are packaged differently and employ air cooling. 

The Backing Store design, for this analysis, assmnes a 65K-bit CCD chip. It also would be packaged differently than 
the LSI logic of the FMP. The above board count is for a Backing Store of 268 million 64-bit words (plus SECDED). 

The results of the analysis are presented in Table 3-3 which gives both raw failure rate and an expected 
rate after taking SECDED into account. In addition, the mean time between failures (MTBF) by unit 
are shown. 


TABLE 3-3. EXPECTED FAILURE RATE BY FUNCTIONAL UNIT 


Unit 

Expected Failure Rate' 
(per thousand hours) 

Percent 
Checked 
by SECDED 

MTBF 

(hours) 

with SECDED 

Raw 

With SECDED- 

Scalar Processor 

0.9319 

0.8387 

10 

1,192 

Main Memory 

21.2448 

0,0212 

99.9 

47,170 

Memory Interchange* 

0.1860 

0.0930 

50 

10,753 

Memory Fanout 

0.2244 

0.1122 

50 

8,913 

I/O Unit - PDC**(8) 

0.2896 

0.2896 

0 

3,453 

I/O Buffer 

0.6639 

0.0007 

99.9 

1,428,571 

I/O Distributor 

0.0930 

0.0650 

30 

15,384 

Map Unit 

0.2790 

0.1674 

40 

5,974 

Vector Unit***(9) 

3.3480 

3.0132 

10 

332 

Backing Store 

67.6500 

0.0677 

99.9 

14,771 

Overall 

94.9106 

4.6687 


214 

*Includes Swap Unit 
**lncludes Data Set 
***lncludes Buffer Unit 
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A factor of 1000 improvement was used for SECDED correction in the memory functions which seems 
conservative because of the size of the memory (assuming good memory maintenance). This is because 
one chip contains only one bit of a word and two failing chips have low probability of being in the 
same word. However, if this assumed SECDED correction improvement is off by a factor of 10, the 
total rate with SECDED would only change from 4.6687 to 5.4740; SECDED effectively removes memory 
as a large contributor to the system failure rate. 

The raw failure rate indicates a system failure roughly twice a day with over 90 percent of these failures 
being memory circuit chip failures (4K bipolar and 65K. CCD). SECDED improves this to an expected 
failure rate of about 3.4 per month. The majority (2/3) of the remaining faiimes with SECDED included 
are in the Vector Units. Because of the great difficulty in providing correction of results in high-speed 
arithmetic pipelines, a 9th functional pipe line has been added to the 8 required for operation. This, along 
with the self-checking features built into the pipe, means that a failure in a pipeline can cause the 
Maintenance Control Unit to replace the failing pipe. The job that was miming at the time of the pipeline 
failure is lost but no further time is lost. The failing pipe is connected to the maintenance processor to 
be fixed off-line. 


EFFECTS OF LSl-H ON RELIABIUTY 

There are two circuit developments that can have an effect on system reliability using LSI-I circuits: going 

to a new generation of ECL, LSI circuits and using a 256K CCD chip. It is hoped that a circuit density 
improvement of 4 times can be achieved. This development should reduce the number of logic boards by 
a factor of three (not four because it is expected that the number of coax connections on a board will 
not increase by a factor of 4 but perhaps by a factor of from 1.5 to 2). The number of boards in a 
pipeline is then expected to be reduced from 8 boards to 3 with reliability improving by a factor of 
approximately 2. 

The 256K CCD chip enables the Backing Store to be built with about 550 boards instead of 2,100. A 
reliability improvement of about 2 can also be expected here. (The boards become slightly more complex, 
the SECDED improvement factor decreases slightly, and the more complex chip will have a higher failure 
rate.) 

These changes can be expected to yield the failure rates per thousand hoiurs shown below: 



Raw Failures 

' Failure Rate 
With SECDED 

MTBF (hou 
With SECDl 

Logic boards 

2.68 

2.290 

437 

Memory 

22.57 

0.023 

43,487 

Backing Store 

33.83 

.034 

29,412 

Overall 

59.08 

2.347 

426 


The raw failure is improved by 38 percent and the SECDED failure rate by 50 percent. 
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FMP AVAILABILITY ASSESSMENT 


Admittedly, some of the above material is subjective; this is of necessity at this early design stage. Detailed 
design is lacking on most elements of the FMP, component and board counts are estimated, effects of 
SEEDED are speculative since sufficient data are not yet collected, and device failure rates and modes are 
not yet established for devices of the future. 

Since the failure rates previously mentioned show that raw failures are dominated by Main Memory and 
Backing Store, and since these FMP units are ideally suited for the application of SEEDED, a more objective 
analysis of maintenance strategy for failures covered by SEEDED is presented. Ideally, over 90 percent of 
the raw failures in the FMP could be essentially eliminated from the overall failure rate if SEEDED could 
effectively cover the failures by correcting single-bit failures. The improvement factor of 1,000 used above 
for SEEDED results in MTBF of about 5.4 years for Main Memory and about 1.7 years for Backing Store. 
This leaves the overall MTBF of 214 hours dependent mostly on the Vector Units which have a spare unit 
that can be switched off-line for repair. 

The computer industry, throughout its history, wherein memories without error correction have been utilized, 
has developed a cultural habit of immediately removing all symptoms (intermittent or solid) of memory 
failure. With the advent of error correction (SEEDED) in a memory system, single-bit memory failures are 
corrected automatically, thus deferring the necessity of removal until a more convenient time. Allowing 
single-bit faflures to acciunulate in the memory eliminates the consequent emergency maintenance time and 
reduces remedial maintenance time. This not only increases the time available to the customer but reduces 
the cost of memory maintenance. 

To realize these benefits a memory maintenance strategy must he developed which allows the accumulation 
of single-bit failures, with sufficiently low risk of system (double-bit) failures. This is . done by exploring 
the risks involved with accumulating single-bit failures in the memory imtil a planned periodic remedial 
maintenance period. Such failures are corrected by the SEEDED mechanism making the system appear to 
have no failures, thereby improving the effective failure rate (or MTBF). 

In memory systems having parity checking, all failures are considered fatal because the information stored 
or read has no credibiUty. The term “Fatal” is used because usually operation is halted, and emergency 
diagnosis and remedial action is taken to restore confidence in the memory. A memory system utilizing 
SEEDED corrects single-bit failures and detects double-bit errors which are considered fatal for the above 
reason. It follows, then, that single-bit failines can be accumulated until a double-bit failure occurs. At 
that time the memory could be swept clean of all failures restoring confidence in the system. 

Such a memory maintenance strategy may be acceptable in some circumstances. However, since the double- 
bit error like the parity error could occin at any time during customer use, its occurrence could be quite 
costly. 

On the other hand, fatal failure can be predicted as likely to occm at or beyond some specified time in 
the future, given that the failure rate of the storage devices is specified. The probability, Pp, of such an 
occurrence may be chosen so that it is quite likely that no fatal failure would occur during use (the 
maintenance interval) if maintenance were scheduled at or before that time to remove all single-bit failures. 
This strategy calls for removing all storage device failures every maintenance interval, M, such that the 
probability of a fatal failure is no greater than Pp. 
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The probability, Pp, of a fatal failure (double-bit) occurring during a maintenance interval is one minus 
the probability of success, no fatal failures, during the maintenance interval. This probability of success 
is the product of the probabilities that the next failure will not be fatal for each interval between failures 
within the maintenance interval. These latter probabilities of success are one minus the probability that 
the failure will be fatal (double-bit). Thus: 


1-Pp -[1 - 


l(c-l)P. 

T-1 


a][l - 


2(c-l)P 


T-1 


a] 


[1 

T-n 


where c is the number of bits in a SECDED word (including syndrome bits), P^ is the probability that the 
two failing devices will have matching failing areas, T is the total number of storage devices in the memory 
system, and n. is the number of devices that fail during a maintenance interval. (Note that if devices always 
fail in their entirety, Pg = 1; this is worst case.) 

Parameters c, Pg, and T are generally known for a ^ven system, based on its design. Using various values 
of n, the equation can be solved until a Pp is obtained which is below an acceptable risk of a fatal failure 
occurring during a maintenance interval. With n established, the maintenance interval, M, can be determined 
from: 


M = 


n 

dT 


where d is the device failure rate. For risk of less than 0.1 (10 percent), Pp can be translated into 
MTBF (in years) of fatal failures by: 


MTBF = 


M 

8760 Pp 


This .procedure can now be applied to the FMP Main Memory and Backing Store. For this analysis, an 
acceptable risk of fatal failure, Pp, during a maintenance interval, M, is assumed to be 0.01 (1 percent). 
The other parameter which must be assumed for lack of established data is Pg, the probability that failing 
■devices will have matching failing areas. For this reason a worst-case value of 1 is used, assuming the 
entire device fails. With all other parameters known for a given design: 


Parameter 

Memory 

Backing 

Store 

Pp 

0.01 

0.01 

Pa 

-1 

1 

c 

39 

523 

d 

2x10’'^ 

2x10-'^ 

T 

159,744 

267,776 

Results: 

n 

8 

3 

M 

10 days 

2 days 

MTBF 

2.9 yrs 

0.5 yrs 
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These MTBF figures fall short of those stated earlier, most notably for the Backing Store. The improve- 
ment factor resulting from this analysis of SECDED can be determined using the raw failure rates (or 
MTBF) from Table 3-3 for Main Memory and Backing Store and the MTBF figures determined above with 
SECDED. 

The improvement factor determined in this manner is somewhat in excess of 500 for Main Memory and 
300 for Backing Store. It should be noted, however, that a worst-case value of 1 was assumed for P^. 

Referring back to Table 3-3 and substituting the above values of MTBF for Main Memory and Backing 
Store, the overall MTBF for the FMP becomes 207 as opposed to 214 in Table 3-3. 

As stated above, = 1 was used since a value is not yet established by statistical data. If this turns out 
to be smaller (and it is reasonable to expect this to happen) the results of this analysis are improved 
considerably. For example, P^ = 0.3333 for Main Memory produces the factor of 1000 improvement with 
SECDED over raw failures. 

Backing Store, on the other hand, would require a P^ of 0.12 to obtain a 1,000 times improvement by 
SECDED. However, one more parameter deserves consideration; this is c, or the,nund)er of bits in a 
SECDED word. Design considerations thus far have established the Backing Store as 512 data bits plus 
11 SECDED bits for c = 523. K a data size of 128 is used (c = 128 + 9 = 137), a-P^ of about 0.45 
would produce the factor of 1000 improvement. This may become a powerful argument for a 137-bit 
SECDED word size for Backing Store as design progresses. 

The above consideration of chanps in P^ and c for Main Memory and Backing Store also produce improved 
values of n, accumulated failures, and M, maintenance interval. The changes and the results are summarized 


below. 


Parameter 

Main 

Memory 

Backing 

Store 



Pp 

0,01 

0.01 



Pa 

0.3333 

0.45 



c 

39 

137 



d 

2x10*'^ 

2x10'^ 



T 

159,744 

280,576 


Results: 

n 

15 

9 



M 

19 days 

6 days 



MTBF 

5.6 yrs 

1.7 yts 
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SOFTWARE DESCRIPTION 


THE PROGRAMMING LANGUAGE 

In the first phase report, the subject of programming languages for the FMP was discussed. The conclusion 
was that FORTRAN, despite its technical deficiencies, was the most likely language to be readily adopted 
by the applications programmers. The language, PASCAL, is rapidly becoming the predominant system 
programming language and is therefore recommended as the language in which to write operating systems 
and compilers' for the FMP complex. The specification of the PASCAL dialect should be accomplished 
in the third design investigation phase of the NASF project. 

The purpose of this particular report is to introduce the FORTRAN language extensions felt to be 
essential for the programming of the FMP. After review of several alternatives with NASA investigators 
and Control Data FORTRAN specialists, it was determined that the most probable solution to the language 
problem was the absolute minimization of new language constructs and syntax. This was deemed necessary 
because of the time required to implement wholly new language features in an existing compiler, while at 
the same time trying to create new object code output from that compiler optimized for a new architec- 
ture such as the FMP. 

Several key decisions were made. They are: 

• Management of the Vector Buffer Unit would be reserved for the compiler, as the 
compiler manages the Register File for the CDC STAR computer or the X, A and B 
-registers for the CDC CYBER family. 

• Management of the Backing Store would be left to the programmer, using the statements 
BUFFER IN and BUFFER OUT as modified in this proposal. 

• The STAR FORTRAN vector facility requiring explicit descriptions of the vector lengths 
in arithmetic statements would be abandoned. 

• The programmer must aid the compiler in producing vector operations by d^cribing 
regions in which the compiler is to perform vectorization. 

• Some facilities must be provided to assist the programmer in moving code which 
presently exists in subroutines into in-line sequence, thereby reducing the overhead 
attendant to JUMP operations, in order to make vectorization easier. 

Compilers have difficulty vectorizing programs containing many subroutine calls since the FORTRAN CALL 
doesn’t restrict the behavior of such subroutines, thus permitting recunion or vector overlap to nullify the 
possibilities of vectorization of the calling routine. 
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THE LANGUAGE PROPOSAL 


It was originally intended to provide a full FMP language specification in this report. The amount of 
time and resources av^ahie for this phase made such a detailed specification impossible, particularly after 
many man-hours had been consmned in fruitless pursuit of several bankrupt alternatives. 

What follows then, is a proposal for a language which would have to be fleshed out in full ‘spec’ form 
in subsequent phases of the NASF study. 

Base Language 

The basic language for the FMP should be FORTRAN which conforms to the ANSI Standards of 1966, 
to be replaced by the 1978 ANS Standard for which approval is expected soon. For the FMP first 
installation a FORTRAN based on the 1966 standard would be acceptable since compilers for this version 
abound on most standard computer products that might be considered for the Front-End Processors. 
Conversion of the NASF to the 1978 standard should await the availability of compilers at least as mature 
as those now extant for conunercial computers. ' 

The choices available with Control Data Computers are the CDC CYBER FORTRAN Extended compiler with 
extensions to permit array dimensions of up to 5 (refer to the CDC FORTRAN Extended Reference 
Manual, publication number 60497800) or the STAR-100 FORTRAN compiler (refer to the CDC STAR 
FORTRAN Language Reference Manual, publication number 60386200). Obviously, similar compilers are 
available on a wider range of equipments, but the Control Data investigations have been directed toward 
modification of the CDC compilers for this purpose. 

The Extensions 


CODO Statement 

The CODO (Concurrent DO) statement invokes a block of FORTRAN code terminated by an ENDCD 
statement. The form of the CODO statement is: 

CODO i = ej, 62 (,63) 


or: 

CODO i - ej, 62 (.63); J = 64, 65 (,eg) 
or: 

CODO i = ej, 62 (,63); j = 64, 65 (,eg); k = e^, eg (,ep) 

where i, j and k are integer variables, ej through e^ are integer expressions, and the terms. (,63), (.e^), 
and (,€5) are optional parameters. 

The meaning of the CODO and ENDCD statements is to define a block of code which must meet the 
following restrictions: 

• No subroutine or function calls are permitted in a CODO block, with the exception of 

certain built-in functions (implicit functions} such as SQRT. In the case of allowed functions 
the compiler does not generate a generalized subroutine call, but either includes the object 
code in-line, or generates a special call to a vector FORTRAN subroutine (if the object time 
parameter is of type vector). 
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• No branch statements, IF statements, or ASSIGN statements are permitted. 

• No input/output statements are permitted. 

• The FORTRAN programmer is assuring the compiler that the variables declared in 

the CODO block do not conflict in storage, via FORTRAN ‘tricks’ such as EQUIVALENCE 
or overlaid data due to passed parameters in subroutines pointing to conflicting arrays. 

In exchange for these restrictions the FORTRAN programmer is assisting the compiler in the generation of 
optimal code for the FMP, using vector operations. 

Simple CODO Statements 

The expression: 

CODO 1=1,100 

Aa)=Ba)+C(I) 

ENDCD 

rcsults in the generation of a string of code which, when executed', will perform a memory to memory, 
vector add of the elements in vectors B and C. 

The expression: 

CODO I=1,100;J=1,100 
A(U)=B(I,J)+Caj) 

ENDCD 

performs the element by element sum of arrays B and C, placing results in A. 

Complex CODO Statements 

Many statements in CODO blocks mast be analyzed carefully by the compiler to produce optimum code 
for the FMP. In the previous examples, the relationship between the source FORTRAN statement and 
the resultant object code is clearly seen. In most instances this relationship becomes obscured, and thus 
more complex forms of object code are produced. 

The expression: 

CODO 1=1,100 
RR=A(I)+B(I) 

ENDCD 

forms a one hundred element temporary vector called RR, and places therein the sum of the vectors B 
and C. Temporary vectors produced in this manner in CODO blocks cannot be referenced outside of 
CODO blocks, although one CODO block may reference a temporary vector produced by another CODO 
block. Temporary vectors of this type may not be referenced by subscripts (that is, RR(1)) within the 
CODO block. 
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The expression: 


CODO J=l,100 
D(I,K)=A(J.I^)+B(J,IJ&) 

ENDCD 

produces a temporary vector of one hundred elements containing the sum of the columnar elements of A 
and B, and stores the result in a sequentially arranged column in the array D at the effective memory 
location of D(IJJK). Note however, that D is only defined as a two-dimensional array. Thus the third 
colunmar dimension is created by the compiler as a temporary region for storage of data. In effect then, 
the array D contains a two-dimensional matrix, each element of which is a 100-element vector. 

As in the previous case, arrays produced in this way may only be referenced within and among CODO 
blocks. Thus they may not be passed as parameters or referenced in COMMON, EQUIVALENCE or 
BUFFER IN/BUFFER OUT statements. 

This form permits the direct inclusion of existing codes, in their present FORTRAN form, by simply 
‘blocking’ groups of vectorable statements into CODO blocks, without destroying the original form of 
computation and associated documentation. Note that this example is equivalent to the use of: 

DIMENSION D(100,100,100),A(100,100,100)3(iOO,100,100) 

CODO J=l,iOO 
D(JJJK:)=A(Jd[,K)+B(J,I,K) 

ENDCD 

which would permit the programmer to reference the array D or any of its elements- in any legal FORTRAN 
manner. In Section 2, a sample coding is given of the kernel of the left-hand-side calculations. From 
Figure 2-2 the following has been extracted to illustrate the use of the principles discussed so far. 


100 = 

110 = 

120 = 

130= 

140= 

150= 

160= 

170= 

10O= 

190= 

200 = 

210 = 

220 = 

230= 

240= 

250= 


DO 20 L=2,LMAX-1 
C***FILTRX 
C 

CODO 0=1, jmax;k=2,kmax 
RO=Q (K, L, 6,,_0 

XK=(X{K+l,L,'j)-X(K-l,L, J) >-»-DY2 
Yi:=(Y(K-H,L, J)-Y<K-l7L, J))*DY2 
ZK=(Z(K-H1,L, JJ-2(K-1,L, J) )*DY2 
XL=(X(K,L-i-l, J)-X(K,L-1, J) )-K-DZ2- 
YL=(Y(K,L-i-l, J)-Y(K,L-1, J) )#DZ2 
ZL= ( Z ( K, L-f-1 , J) -Z ( K, L-1 , J )0 *DZ2. 

D( J, 1,2)=HDX#( (YK*ZL-ZK^*-YL>*RJ) 

D < J , 1 , 1 ) =HDX-H- < -OMEOA* ( Z ( K, L , J ) * ( R J* ( Yk'-»ZL-ZK*YL ) ) 
1 -YCK.L, J)*RJ*(XK*YL-YK*XL) ) ) 

D ( J, 1 , 4 ) =HDX-«- ( (.XK-»YL-YK*XL > *R0 > ~ 

D( J, 1,3)=HDX^K (ZK*XL-XK*ZL>*RJ) 
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1. The CODO block produces vectors by varying both the J and K indices. The K index 

is varied first from 2 to KMAX, then the J index is incremented and the K index scanned 
through its range again. Lines 140 through 200 in the original scalar code formed a series 
of scalar temporaries. Inclusion in the CODO block for this version causes the scalar 
temporaries to become vector temporaries. In this case the compiler is capable of generating 
a GATHER operation of KMAX-2 elements for each J from the arrays X, Y, Z, and Q. The 
result of these GATHER operations is to produce a series of ‘invisible’ temporary vectors 
(known only to the compiler, and unnamed in the FORTRAN source) of length (KMAX-2)* 
JMAX. 

2. The ‘invisible’ temporaries are then combined arithmetically according to the FORTRAN source 
code to produce ‘visible’ temporaries (that is, named by the programmer), such as RJ, XK, 

YK, ZK, XL, YL, and ZL. As described previously, these scalar-appearing temporaries are 
actually vectors. Statements 210 through 250 then combine these temporaries to form the 
array segments for D, which will be a three-dimensional array, each clement of which is - a 
vector of length KMAX-2. 

3. The truly scalar value OMEGA which is defined in scalar portions of the FORTRAN source 
input, is used as a scalar in the CODO block. In line 220, it is broadcast as an operand 
in the multiplication operation involving the array Z. 

4. The compiler will check all references to ensure that they are conformal (that is, all dimensions 
are equal for all operations). . A left-hand-side (of the equals sign) operand can have one 
‘invisible’ dimension provided by the compiler, based on the dimensions of the* operands on 

the right-hand side. However ^ references to that operand must use the same dimensionality. 
Thus the expression: 

CODO K=2,KMAX 
D(I,J)=A(K,I,J) 

ENDCD 

CODO L=1,LMAX 
Da,J)=A(LJJ) 

ENDCD 

is not allowed, since at compile time, the storage allocation and object code necessary to 
assign the array D cannot be determined because LMAX might not equal KMAX at execution 
time. 


Boundary Conditions 

The CODO statement, in the forms discussed previously, causes a uniform application of the CODO indices 
to all variables possessing tiiose indices as subscripts (or implied subscripts when the compiler has generated 
an internal, ‘invisible’, temporary vector). The addition of a special operator to each indexing statement 
(.CCNT.-concurrent index) permits varying index values with differing results. 

The statements: 

CODO I=1,100.CCNT.J=1,100 
AG)=B(J)+C(J) 

ENDCD 
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are equivalent to: 

CODO 1=1,100 

Aa)=B(i)+ca) 

ENDCD 

since the operator .CCNT. indicates that the index J is incremented concurrently with the index I. This 
feature can be utilized in: 

CODO 1=1 ,100.CCNT. 1=1,200,2 
A(J)=B(I)+Ca) 

ENDCD 

which fonns the element by element sum of the arrays B and C and places the results in every other 
element position of the array A (which obviously must be dimensioned at least by two hundred elements). 

The .CCNT. operator requires the deHnidon of certain end cases in the use of several concurrent indices: 

1. The number of index steps implied by the values of the index limit and index step need 
not be identical for all index variables associated by the concurrent operator (.CCNT.). Thus 
the statement: 

CODO I=1,100.CCNT.J=2,99 

permits both the index I and the index J to have different starting and ending values. 

2. The use of .CCNT. implies that the two indices are synchronous. In the example in 1. 
above, where I and J have different initial values, when 1=1, J would be equal to 2, and 
when 1=98, J would be equal to 99. 

3. When the termination values are unequal, each index with the lesser termination value 
upon reaching termination takes on values of NULL for each index step of the remaining 
index. Thus in the example given, when 1=98, J=99; but when 1=99, J can no longer 

be incremented past its termmation value of 99 and thus becomes NULL. The result of 
this action can be to create a ‘bit bucket’ into which operands are discarded rather than 
stored. 

The expression: 

CODO I=1,100.CCNT.J=2,99 
A(J)=B(I) 

ENDCD 

would store 98 values from array B, beghining at element 1, into the anuy A, beginning 
at element 2. A more meaningful example would be: 

CODO K=l,100a=l,100.CCNT.J=2,99 
A(J,K)=Ba,K)+C(J,K) 

ENDCD 

which can generate object code for an ADD vector operation of length 100*100, but where 
two elements from each 100 in the J direction are not stored. This particular operation 
would result in the creation of a Control Vector by the compiler, with two zero bits in 
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each hundred. Another case is: 


CODO K+1,100.CCNT.J=1,200,2 
A(K)=B(J) 

ENDCD 

which would in effect compress out of the vector B every other element, placing the 
result in the vector A. In this example the compiler would form an Order Vector 
which would be applied in the Map Unit to perform a vector COMPRESS on the array B, 

The Order Vector would contain an alternating ones and zeros pattern, and could be 
generated at compile time, or at object time, since it is a fixed, non-data-dependent pattern. 

As has been stated previously, the object of creating the CODO construct is to permit the insertion of a 
minimum number of lines of new code into existing programs to assist the compiler in the vectorization 
process. The examples given in this report are only a recommended staring point for subsequent program- 
mability studies of the FMP. 


LEVEL 2 Statement 


The form, of the LEVEL 2 statement is: 

LEVEL 2 array name 1, array name 2, . ., array name n 

All arrays specified in a LEVEL 2 statement are assigned to Backing Store. The only references permitted 
to these arrays are via the BUFFER IN/BUFFER OUT statements. 

This feature permits the assignment of lai^e data blocks to Backing Store by symbolic name. Array name 
1 through array name n may appear in other FORTRAN declaratives (such as DIMENSION, INTEGER, etc.) 
or may be defined solely in the LEVEL 2 statement as: 

LEVEL 2 A(100,100,100),B(100,100) 

The compiler attempts to assign arrays greater than or equal to 32,768 words in length to an integral 
32,768 word block starting address. 


Additional Extensions 

The CODO and LEVEL 2 statements discussed above and BUFFER IN, BUFFER OUT, UNIT which follow 
represent what is considered to be the minimum essential extensions to the FORTRAN language. They 
assume a reasonable extension of the compiling and. optimizing capabiliiies of known FORTRAN compilers, 
such as the STAR FORTRAN compiler. The objective of minimizing extensions is, of course, to reduce 
development and testing time for the compiler, and retraining time for programmers. However, the pro- 
grammers must be trained to understand the relationship of the CODO statement to object code efficiency, 
a process which will necessarily be somewhat long and agonizing. 

Other suggestions are offered to assist in the programming of the FMP in FORTRAN, but are not as 
essential as those few previously mentioned. The other suggestions are: 

• MACRO facility — Examination of the segment of the three-dimensional code in Section 2, 
Figure 2-2, shows that the entire segment has been converted to in-line code. Thus the 
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subroutines XXM and BTRI have been included directly and the consequent subroutine 
calls eliminated. This serves two purposes. First, in the case of XXM, unnecessary code 
and conditional branch statements are eliminated, since when used in the AMATRX segment, 
the conditions of the indices are known beforehmd, and since the XXM routine is never 
processing data on the boundaries in this instance, the overall code can be replaced. The 
second purpose served is that by eliminating subroutine calls, the code can be blocked into 
CODO segments more efficiently. As in scalar code optimization, the compiler can better 
optimize code if it has a larger block of ‘uninterrupted’ code to deal with. 

The scheduling of the GATHER operations, implied by the statements in Figure 2-2, lines 
140 through 210, and lines 290 through 360 can be optimized easier over fire whole block 
of CODO, ending in line 660, than could be optimized of the CODO ended at line 210 (as 
if a subroutine call had been made to XXM). In this case, the GATHER operations for 
Q(K,L,Y,J) could be initiated, immediately after the GATHER operations have been completed 
for X, y, and Z, and can .be accomplished conciurently with the calculation of the metric 
cross products in lines 140 through 250. 

The in-line coding of large segments of computation places a burden on the programmer in 
both keypunching (inputting somce statements) and maintaining congruence between each of 
the in-line expansions of what would otherwise be a subroutine. Specifically, if the sub- 
routine XXM is kept in subroutine format, any changes in the calculations in XXM need be 
made only once within the subroutine. If the subroutine is included in-line at the point 
where it is called in the main program, then each version would have to be changed. A 
means for reducing this problem is the inclusion in the language support system of a powerful 
MACRO processor, which can recognize particular constructs, evaluate parameters, and generate 
the necessary lines of FORTRAN source code. The most desirable MACRO processor would 
be one which is imbedded in the language processor itself, since items such as the variable 
attributes and lengths are readily available. However, no such MACRO facility is prescribed 
as a standard for FORTRAN, and no compiler presently possesses such capability. To 
minimize .development cost then, a MACRO preprocessor, based on already operational 
systems, should be provided. Two very powerful MACRO systems are available on coitunercial 
equipments; they are called ML-I* (ref. 1) and STAGE-2** (ref. 2). There are a host 
of other candidates available on non-CDC equipment. 

.'If code development is to continue in a reasonably dynamic way throu^ the lifetime of the 
NASF, then the value of such a MACRO facility is extremely high. However if the system 
code becomes rather static, then the manual labor involved in-' creation and maintenance of 
the code may not justify the inclusion, documentation, training and maintenance of a 
pphisticated MACRO facility. At this juncture, Control Data would highly recommend 
investigating and including such a MACRO facility (preferably one already in existence) for 
the FMP and the front-end processors, operating solely on the front-end processors. 

Intrinsic Functions — Certain attributes of codes that may find their way onto the FMP 
require the handling of data-dependent vectorization. The FMP hardware provides the 
facility for mampulation of array data based on some selection criteria, and to some extent 
the CODO statements can cause the compiler to generate operations using these facilities. 

In other cases however, the programmer must be aware of the vector nature of a given 
conditionallv-executed operation and should have direct access to that facility. This can 
be accomplished by defining a set of built-in, intrinsic functions which might be: 

VCMPRSS(bA) — Compress array A by bit vector b 

VMRG(A,b,C) — Merge elements of array A and C imder control of bit vector b 

VMASK(A,b,C) — Produce a vector consisting of elements from A corresponding to 
one bits in b, and elements of C corresponding to zero elements 
in b 

VSEARCH(A.EQ.B) ~ Compare elements of A and B, return a scalar index variable 
containing the position in the arrays at which the comparison 
is:jnet. Any legal FORTRAN relational operator (.NE.,.GT., 
.LT.,.LE.,GE.,.NOT.) are permitted in the relational expression. 
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VCOMPARE — Compare elements of A and B forming vector of integers containing 
the index position in the array, where the relation is met. 

VSELECT — Compare elements of A and B forming a bit vector, with one bits 
in each posidon wherein the relation is met. 

The BIT attribute permitted by STAR FORTRAN, and the logical operators .AND.,.OR.,.XOR. 
would be used on bit strings to provide manipulation of the Order and Control Vectors, 
explicitly. 

• Machine Langu^e — Experience with programming portable modules in STAR FORTR.4N 
has shown that use of the ‘escape valve’, introducing in-line machine code via the scheme 
called QSmnemonics, has resulted in undecipherable code, which is difficult to optimize by 
a compiler since the compiler caimot control the resources of the various functional units 
as closely as when no explicit machine code is allowed. Unless the compiler development 
cannot utilize an existing compiler system as a base, and unless current optimization techniques 
prove to be useless for the FMP (a very unlikely event), the use of machine code escape 
mechanism should be prohibited and not implemented in the compiler. If events require 
the development of a brand new compiler with massive language changes, then it may be 
necessary to introduce this form to provide early access to the FMP facilities while the 
compiler is maturing. 


Buffered Input and Output 

Explicit input and output can be initiated by the FORTRAN programmer for data transfers between the 
Backing Store and the network input/output system, and between the Backing Store and Main Memory, 

The mechanism for controlling this input/output activity is the use of the FORTRAN BUFFER IN and 
BUFFER OUT statements. 

The length of the buffer area in which the data is contained in memory should be an even number of 
multiple of blocks for all files. Ordering the data in this manner provides the most economical use of 
storage. 

Any unit referenced in a BUFFER statement cannot be referenced in any other input or output statement; 
however, such units can be referenced in the unit positioning statements BACKSPACE, REWIND, and 
ENDFILE. Once buffered input/output is established for a logical unit in a FORTRAN program, all input 
and output for that unit must be buffered. 

The ENCODE and DECODE statements are most frequently used to process the data read into a buffer, or 
to gather and place data in a buffer for transmission to external files. Status of the peripheral device 
involved should be checked by the UNIT function before the buffer operation is begun. 


BUFFER IN Statement 

The execution of the BUFFER IN statement transfers data from the unit specified, in the mode pven, to 
Level 2 storage locations first to last. 

Form 


BUFFER IN(unit,mode)(first,last) 

unit An integer constant or variable that represents the logical unit number. 
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mode 


first 


last 


An integer constant or variable that specifies the recording mode of the data 
being read. The permitted values are: 


0 7-track BCD' 

1 7-track or 9-track binary 

2 CDC 64 character ASCII subset 

4 Mass storage (disk) 

5 Archive 

6 Front-End Processor (FEP) 


A Level 2 array reference defining the first location in the buffer into which 
data is to be transmitted. The transmission continues from tliat point to the 
location spedfied by parameter last. The array name used can be type . 
character, integer, real, double precision, or complex. 


A Level 2 array reference defining the location in the buffer into which the 
last data item is to be transmitted. The location designated by the parameter 
first must be less than or equal to the location designated by parameter last; 
both must refer to the same array. The array name used can be type character, 
integer, real, double predsion, or complex. 


The BUFFER IN statement initiates data transmission from the logical unit to the buffer. Before data in 
the buffer can be used, the status of the data transmission must be checked using the UNIT function. 


Example 

BUFFER IN (5,2) (X(1),X(10)) 


UNIT Function 

The UNIT function is suitable for evaluation in an arithmetic IF statement that causes branching to 
appropriate statements as directed by the ^lue returned. Failure to perform a unit status check renders 
unpredictable the input that is transferred to the buffer by the preceding BUFFER IN statement. 

Form 


UNIT (u) 

u An integer constant or variable that represents the logical unit number. 

The function retmms one of the following real values: 

-1.0 Unit ready 

0.0 Unit ready; end of file encountered 
1.0 Unit ready; parity error encountered 

BUFFER OUT Statement 

The execution of the BUFFER OUT statement transfers data to the unit specified, in the mode given, 
from Level 2 storage locations first to last. 
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Forni 


BUFFER OUT (unit, mode)(first.last) 

unit An integer constant or variable that specifies the logical unit number. 

mode An integer constant or variable that specifies the mode in which the data 

record is to be written: 

0 7-track BCD 

1 7-track or 9-track binary 

2 CDC 64 character ASCII subset 

4 Mass storage (disk) 

5 Archive 

6 Front-End Processor (FEP) 

first A Level 2 array reference defining the first location in the buffer from which 

data is to be transmitted. The array name used can be type character, real, 
integer, double precision, or complex. 

last A Level 2 array reference defining the location in the buffer from which the 

last data item is to be transmitted. One logical record is written for each 
BUFFER OUT statement. 

Example 

BUFFER OUT (6,3) (X(1),X(10)) 

Extensions for Backing Store/Main Memory Transfers 

For transfers between Backing Store and Main Memory, the BUFFER IN and BUFFER OUT statements are 
extended as follows: 

BUFFER IN (Level 2 array reference) (first, last) 

BUFFER OUT (Level 2 array reference), ^(fint, last) 

Level 2 An array name or array reference (subscripted array name) to an 

array reference array declared to be in Level 2 memory (Backing Store). 

first A Level 1 array reference defining the first location in the buffer into 

w which data is transmitted. The transmission continues from that point 
to the location specified by parameter ‘last’. All transmissions are in 
integral number of 32,768-word blocks. 

last A Level 1 array reference defining the location in the buffer into which 

the last data item is to be transmitted. The location designated by the 
parameter ‘first’ must be less than or equal to the location designated by 
the parameter ‘last’; both must refer to the same array. 

The UNIT function is also extended as follows: 

UNIT (level 2 array reference) 

which returns the following real values: 
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-1.0 transmi^on to or from referenced array is complete 

0.0 transmission to or from referenced array not complete 

.1.0 transmission to or from referenced array cannot be completed (Backing Store 

locked out, or data not present) 

The compiler attempts to assign arrays in Backing Store and in Main Memory to large block (LB) 
boundaries (32,768 64-bit word segments). If the BUFFER IN, BUFFER OUT statements reference 
integral blocks, the compiler generates direct SWAP instructions. If the block being transferred does not 
align to a block boundary, or if less than an integral block is transferred, the compiler generates a SWAP 
to an intermediate Main Memory block, then generates ‘in-line’ code to move the sub-block (or partial 
block) to its appropriate Main Memory locatiom The same operation is performed in reverse for BUFFER 
OUT statements, with sub-blocks being moved to Main Memory intermediate blocks and then a Backing 
Store SWAP initiated. 

Because of the compiler’s attempts at assignment of arrays for optimum transfers, the programmer should 
be cautioned that arrays are not necessarily stored sequentially to one another by the compiler. Thus the 
statement: 

DIMENSION A(100,100)3(100,100) 

does not imply that B(l,l) immediately follows A(100,100) in actual memory. 

The example: 

DIMENSION Q(100,100),R(100,100) 

LEVEL 2 QB(100,100,100),RB(100, 100,100) 

BUFFER IN (QB(1,1,1),0(1,1),Q(1 00,100)) 

would move 10,000 elements beginning at the first block of QB to the array Q in Main Memory. To 
determine if the final transfer is completed the programmer may use the statement: 

IF UNIT (QB(99,99,100)) 110,120,130 

to branch to the appropriate statement depending on the condition of the transfers imderway. Note that 
the FMP hardware maintains status information on SWAPS in 32,768-word blocks, thus for a BUFFER IN 
operation on block boundaries for an array X(1 00000) the UNIT statement: 

IF UNIT (X(l))l,2,3 

is equivalent in function to the statement: 

IF UNIT (X(32768))l,2,3 

since the hardware flag tested by the object code is identical in both cases. 
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The Specification 


The philosophy governing the introduction of the aforementioned language extensions has been to minimize 
change in language or compiler. The next phase of this project must produce a full scale programming 
language specification which can be used to procure and implement an applications programming language 
for the FMP. 

The first item that must be resolved in that subsequent phase is a choice of the FORTRAN base language, 
FORTRAN EXTENDED, FORTRAN 66 or FORTRAN 78. This decision will have to result from meetings 
between staff members from NASA and the RADL investigators, with schedule risk being a preeminent 
consideration. 

The compiler specification therefore must await this same decision. The only specification possible at this 
time being the hand-compiled examples given in Section 2 and the description of probable compiler actions 
given in the preceding discussions of CODO and BUFFER IN/BUFFER OUT. 
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OPERATING SYSTEM FUNCTIONAL REQUIREMuENTS - FLOW MODEL PROCESSOR SYSTEM 


SYSTEM PHILOSOPHY 

Three factors drive the architecture of the FMP operating system (FMPOS) 

• Minimization of new software development, 

• Reduction of overhead within the FMP CPU, 

• Balance of system resources. 

The development schedule for the FMP system precludes a massive development of software to support all 
of the functions commonly associated with general-purpose computing facilities. To achieve the level of 
total system stability, reliability, and availability implies that a substantially constrained set of functions be 
allocated to the FMP CPU operating system, and existing software be exploited in all attached processors 
to the maximum extent possible. 

The main purpose of the FMP is to perform massive amounts of computation on hi^ly vectorized 
mathematical codes. The objective of the total system installation, therefore, is to maximize the amount 
of time that the FMP is operating at its peak speeds. First and foremost, the language processor and 
related documentation must ensure that the actual computations are matched to the hardware architecture. 
Secondarily, as little FMP resource as possible should be tied up in the management of internal FMP 
functions such as memory allocation. The functional constraints on the FMP serve to reduce both the 
space (in main memory) and the time (usually using inefficient scalar code) required by the CPU-based 
portion of the operatmg system. 

System balance is an important and obvious consideration as the power of the FMP could be quickly 
dissipated by bottlenecks in input or output or in the scheduling of system resources (such as disk space) 
by other supporting processors. 

The preceeding imperatives point to the need for a ^stem philosophy around which a system design can 
be formulated and upon which the total system implementation can be based. The approach taken for 
the FMP is an extension of the “distributed system philosophy” originally evolved for the Control Data 
STAR computer systems. 


Distribution of Functions 


The allocation of system functions should be governed by some basic guiding principles which can be 
used for both hardware and software implementations. A suggested set of such guidelines are offered 
here: 

1. System resources consist of storage and processing facilities. Storage can be central memory, 
backing store memory, disk drives or magnetic tape devices. Processing resources could be the 
FMP CPU, miniprocessors handling system control, or other computers providing support 
functions. 

2. Management of resources is performed by computers which can be programmed to make 
intelligent decisions, and to perform whatever control and management functions may be 
assigned to them. 
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3. The management of resources should be placed as close as possible (electronically, physically, 
and logically) to the resource being managed. Thus a disk management function should be 
allocated to a processor which may actually reside within the disk unit or in the disk 
controller which is normally intimately connected to the physical storage units. 

4. AH such resource management, functions should be moved outward from the central computer 
toward the particular resource. 

5. Form follows function; the hardware should be built to fit all of the fimctions which have been 
moved outwards to the resource, rather than to fit as many functions as possible info an 
existing unit. In place of the word ‘“build” here one could say “sized”, since many standard 
computing elements can be used in distributed fashion, but the tendency to ‘go the cheapest 
route’ usually results in acquiring a processor too small, into which a partial list of management 
functions are then force-fit. 

6. An intelligent processor should manage only its own resources and should be ignorant of 
(and thus unable to manage) other processors’ attached resources. 

7 . A processor should maintain a list of functions which it is capable of performing. Any 
functional requests not in this list should be passed on (if the processor is a communications 
node) or not acknowledged (if the processor is part of a network). Thus no processor rejects 
requests unless they are patently illegal, and therefore no processor need know what functions 
other processors arc capable of performing. 

8. AH communications between processing elements must be through a single, highly structured 
mess^e system, with rigorous attention paid to message formats and protocols. 

9. AH the preceeding principles must be tempered with common sense and technological and 
economic realities. 

The result of the appHcation of this set of groundrules is manifested in visible system featiures such as 
processon whose sole responsibility is the management of files for aU other processors in a given complex. 
This is the rddmate extension of the process, whereby first a processor and software are created to manage 
the motion of a disk arm and the reading and writing of data bits on the magnetic media; thence the 
ability is added to that processor (which is nearly imbedded within the disk unit) to handle error 
detection, retry and some recovery of the data recorded on the disk; then further diagnostic ability, 
management of the disk space, and finaHy the management of the files on that particular disk are added. 

• 

A list of functions to be distributed to a multipJicity of processors then would include: 

1. File management — Control of access, security, backup and error handling, space aUocation. 

2. Communications handling — Management of aU remote access trunks, logon validation, recovery, 
scheduling of resources activated by the remote devices. 

3. Trunk management — Control of the networic that interconnects the coUection of resources 
and processors. 

4. Special processor control — Independent management of special resources such as the FMP 
graphics processors and archival storage coordinators. 


Hardware Interconnection 

The most flexible system organization would permit the interchange of data and control information 
between any set of processors and resources in the system. As the mnnber and variety of processors 
grows, the practical methods of interconnection become taxed by physical limitations such as volume and 
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lengths of cables. The FMP system is based on a network trunk technique (reference section 3.5 of 
Functional Specification). 

In this scheme all intelligent processors are connected together by one or more bit-serial trunks on which 
data can be transmitted, or control information interchanged. Each connection is via a programmable 
device controller (PDC) which is itself an intelligent processor. Management of the trunk (which is itself a 
resource) is distributed among all the PDCs on the trunk which deal with contention and scheduling of 
transmissions. 

Each PDC is capable of providing for attachment to four different trunks, however, in the interest of 
system availability at least two PDCs will be used at each interconnection. Each of the PDCs will have 
access to at least two different trunks. 

All system resources, disks, tapes, graphics, archival, communications, special FMP CPU and front-end 
processors, will he attached uniformly throughout the network, thus permitting the linking of any ^stem 
component with any other. 

In general, data transfers are direct from resource to resource without intervention (or store-and-forward) 
by other processors. Thus a high speed disk unit would transmit data directly to the FMP CPU, without 
the data being passed through any other processor (such as the front-end units); The major front-end 
processors charged with the frle management responsibility in this case would validate the access to the 
particular disks, setup the software linkage (so that the FMP knows where the data is physically located), 
then step out of the way (logically) while the high-speed data transfers take place. 

While the system is on-line, any device or processor can be logically, or even physically, removed from the 
network without disrupting operation,, as long as that resource is not required for a particular computation 
during the time of removal. 
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Software Diterconnection 


The total interconnectability offered by the hardware aspects of the network trunk can be constrained 
by the software system to appear as a variety of traditional interconnection schemes (such as a STAR 
organized network). The choice of constraining the system interconnections must be based on: 

1. Desire to eliminate a multiplicity of interface modules to be written; despite a generalized 
message structure, the act of linking a PDP-10 grapWcs computer to the FMP would require 
some interface logic different from that needed when linking to a disk or a front-end processor. 

2. Format conversion — To produce the most cost-effective system it is desirable to adapt any 
number of existing, proven hardware devices to the purposes, of the FMP. Raw data, not even 
internal arithmetic formats, are rarely identical, thus there exists a need for software to provide 
conversions. These impose overheads in space and time and also require programming and 
checkout resources, which may be in limited supply. The need to reduce the number and 
variety of this type module is great. 

3. The need to restrict access to certain resources ~ For security or system efficiency reasons 

it may be desirable to limit certain interchanges. Thus the graphics processor has no need to 
speak directly to the FMP for any reason, and vice versa, despite the fact that they may be 
attached to a common trunk for purposes of attachment to the high-speed disk units. 

The software must provide,’ via modifiable tables, a means for defining, the apparent interconnection of all 
devices on the trunk. The networic trunk provides an eight-bit address for each unit attached. At the 
base hardware level, any device can have its address established by manually changing the setting of a 
series of keylocked switches. The PDC can have loaded into itself at system startup time a series of soft- 
ware addresses (or address-like structures). Finally, each attached processor will have its own higher-level 
addressee structure. 


Messages, Structure and Discipline 

To keep any system of cooperating but asynchronous processors from degenerating instantly into a state of 
electronic chaos, a rigid set of protocols must be defined and adhered to rigorously. The only (emphasize 
the word only) means of intercommunication is through a predefined set of system messages. There 
cannot be any sneak paths or extra wires used for that one special case. The rule is simply that if a 
function caimot be handled efficiently within the message system, then either the message system must be 
revised or the function abandoned. There can be no equivocation on this rule, lest the system totally 
collapse from special-casing. 

When dealing with the concept of messages, several levels of system consciousness or message envelopes 
must be defined: 

1. Trunk protocol — Each PDC, when communicating with another PDC on the trunk, builds a 
basic trunk protocol envelope around the data being transmitted, and decomposes the envelope 
from data received (Reference section 3.5 of the Functional Specification). 

2. Processor Protocol — Each processor involved in interchanging information on the trunk places 
another level of envelope around the data being exchanged. Whereas the trunk messages involve 
hardware-oriented items such as hardware address codes and cyclic error checking, this second 
level is defined by the software portion of the operating system. The format and contents 

are addressed to the methods wherein the messages are stored, queued, routed, and decoded 
within and by each processor. Suggested formats and a list of message types to be imple- 
mented for die FMP are presented below. 


4-17 



3 . 


The highest level messages are those exchanged between job-level programs executing in the 
major computer processors (FMP, front-end, graphics and archival store manager). These 
messages are primarily for control purposes rather, than for the exchange of quantities of data. 
An example would be the chit-chat between the interactive graphics processor producing the 
displays, and a mesh generation and stretching program residing in the front-end processor 
during a session wherein the aerodynamacist is interactively modifying a mesh structure. 
Specification of this set of messages will have to await further phases of the FMP development 
project. 


Message Formats and Types 

User programs may issue messages which result in the performance of system fimctions. To issue a 
message, the user presets a 2- or more word block according to the Alpha and Beta conventions described 
below, and performs an Exit Force instruction (09) that transfers control to the operating system monitor 
mode. 

Immediately follovrfng the exit force instruction in the instruction stream is either a 32-bit indirect or a 
64-bit direct message pointer. Hexadecimal format of an indirect message pointer is: 

OOEEOOrr 

, rr Register containing the address of the message. 

The hexadecimal format of a direct message pointer is: 

OOFFaaaaaaaaaaaa 

a’s Address of the first full word of the message. 

The message has a two-part standard format. The Alpha (first) portion specifies the function to be per- 
formed, length of parameter list, and where to proceed for enor processing. The Alpha portion has the 
same general format for all messages. 

The Beta .(second) portion contains the parameters. The format of the Beta portion depends on the 
function, as described later for each function code. Alpha and Beta words must start on full-word 
boundaries and must exist in space which has read/vreite or write temporary access. 

When a message is processed without error, the operating system returns control to the next half or 
full word immediately following the message pointer. Thus, calls can be chained by placing one message 
pointer directly behind another. 

Alpha format: 


Alpha (1) 

Alpha {2} 

Alpha (3) 
(optional) 
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r Hexadecimal response code returned by the operating system when message has been processed. 
If no enor occurred, the response code is zero (exceptions: f=0013, f=0016 and ^0017). 

The significance of a non-zero response code varies as described for each function code. 

len If len = FFFF, Alpha (3) contains the length and bit address of the Beta portion. Other- 
wise, Beta is assumed to begin at Alpha (3) and len is the length of the Beta portion. 

c This field varies with the message; usually, it specifies function options or controls. 

f Specifies function to be performed (hexadecimal message code). 

n May specify option or control, may contain a parameter for the message, or may be a 
parameter retiumed during message processing. 

eea Bit address that receives control if error occurs. This address must lie within the program 
issuing the message. If eea = 0, the error is considered fatal to the further execution of this 
user process. 

bl If the Beta and Alpha portions are not contiguous (len = FFFF), this parameter indicates 
Beta length in full words. 

ba If Beta and Alpha portions are not contiguous (len = FFFF), this parameter indicates address 
of Beta portion’s first full word. 

The terms controller and controllee have specific meaning relative to FMPOS. For example, a batch 
processor also controls actions of a user program; the former becomes the controller and the latter, 
controllee. This relationship between programs can exist in other ways as well: one program can 

initialize and/or direct the actions of another. 

Since FMPOS is a file-oriented system, file management is an important aspect of the operating system. 
Although FMPOS takes a little direct responsibility for action on a given file, a set of user messages allows 
a degree of latitude in directing FMPOS processing for a ^ven file. Standard messages also transmit 
information between programs operating in controUer-controllee mode. The messages are calls to the 
system; they are shown in the following table by the alphabetical name of the message. 


MESSAGE FUNCTION CODES 


Message 

Function Codef 

CHANGE FHE NAME OR ACCOUNT NUMBER 

OB 

CLOSE FILE 

05 

CREATE FILE 

01 

DESTROY FILE 

02 

EXECUTE OPERATOR COMMANDft 

21 

EXECUTE PROGRAM FOR USER NUMBERff 

22 

EXPLICIT I/O 

50 

GET MESSAGE FROM CONTROLLEE 

17 

GET MESSAGE FROM CONTROLLER 

16 
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Message ^ 

Function Codef 

GET PACKLABEL AND PFI 

11 

GIVE FILE 

08 

GIVE TAPE ACCESS TO CONTROLLEE 

OC 

GIVE UP CPU UNTIL I/O COMPLETES 

52 

INITIALIZE CONTROLLEE CHAIN 

ID 

INITIALIZE OR DISCONNECT CONTROLLEE 

IB 

KEEP 

28 

LIST CONTROLLEE CHAIN 

13 

LIST FILE INDEX OR SYSTEM TABLE 

09 

MAP 

04 

MESSAGE CONTROL 

18 

MISCELLANEOUS 

24 

OPEN FILE 

03 

POOL FILE MANAGER 

26 

PROGRAM INTERRUPT 

1C 

RECALL 

25 

REDUCE FILE LENGTH 

OA 

REMOVE CONTROLLEE FROM MAIN MEMORY 

19 

RETURN FROM INTERRUPT 

51 

ROUTE AND FILE DISPOSITION 

OD 

SEND MESSAGE TO CONTROLLEE 

15 

SEND MESSAGE TO CONTROLLER 

14 

SEND MESSAGE TO OPERATOR 

lA 

SET ALL PERM FLAGff 

2A 

TERMINATE 

06 

UPDATE USER DIRECTORYff 

23 

USER/ACCOUNTING COMMUNICATION 

OE 

t Reserved for future use: 

07 ADVISE 

20 ABNORMAL TERMINATION INTERRUPT 

27 LINK SYSTEM CALL 

29 SEND MESSAGE TO DAYFILE 

IE, IF, EO-FF Reserved for installation use 

tt Available to a privileged task only 
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THE FMP MONITOR 


The FMP CPU hardware is being designed with a particular mode of system operation in mind. The 
distribution of functions to the other processors attached to the network trunk frees the FMP from many 
of the conventional operating system chores. Thus the hardware design permits a total of 65,536 64-bit 
words to be utilized by the monitor. Since there is no direct input or output to the main memory and 
since the user has dominion over all the remaining memory, there will be no (repeat — no!) operating 
system overlays. The absolute maximum is 65^36 words. Certain hardware instructions have been 
provided to assist the monitor in its resotu'ce management functions; other instructions have been 
consciously omitted to inhibit the desire to add one more feature to the system. 

Allocation of FMP Resomces 

In keeping with the distributed system philosophy, the FMP monitor need only manage the storage 
available to it (Main Memory and Backing Store' — the register file and vector buffers are the responsibility 

of the compiler system), and the computing resources (or which job is to he loaded and executed next). 

Backing Store 

The 256-million-word Backing Store is managed in units of 32,768-word blocks. All data transmissions are 
accomplished in that same size block, however, the input/output channel PCDs may decompose the blocks 
to smaller increments for transmission on the trunk. 

In the initial configuration, there are 8192 blocks of 32,768 words that can be managed. Any program 
executing in monitor mode can address any block in the Backing Store. Programs operating in job mode 
can reference a contiguous band of Backing Store as established by the monitor. The monitor then must 
be able to provide the following facilities: 

• Allocation of Backing Storage for the entire program and data base for a job being loaded 

from the network trunk. This allocation is based on the queued list of jobs submitted by the 

front-end processor for execution. Included in the queued information is the space 

requirements for the job execution. 

• Setup of the base address register and field length register for the job in execution to enable 
that job to reference the Backing Store. 

• Deallocation of storage as the completed (or aborted) job’s data is rolled out onto the network 
trunk. 

• Allocation of small blocks of Backing Store for on-line diagnostic storage, I/O lists (see I/O 
Handling, below), and system statistics buffers. 

• Freezing of blocks in the Backing Storage when explicit input/output requests involve them, 
freeing the same blocks on completion of I/O actions. 

• Developing accounting information for billings based on storage useage over time. 

• On-line exercise (periodically) of the Backing Store map table (which interlocirs the use of 
Backing Store) and any other facilities attached to the Backing Store, to verify that everything 
is still working. 

■ When running a series of small jobs that require small amounts of Backing Store, maintaining 
a table of space allocation for ^ such jobs. 
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Main Memoiy 


Only one job is intended to reside in the Main Memory at a time, thus the monitor need provide 
no spedal facilities for memory allocation. 

The management of Main Memory as a resource coincides exactly in form and content with the 
allocation of the computing resource, which follows. 


Functional Units 

The front-end processon are responsible for the organization, sta^g, and scheduling of jobs to be 
submitted to the FMP. Once a job is fully staged by the FEP (front-end processor) an FEP to FMP 
monitor message is transmitted on the trunk. This is a type 2 message, which gives the following data: 

Job LD. (generated by the FEP) 

Backing Store and Main Memoiy requirements 
I/O list for the staged job (see I/O Handling, below) 

Time limit (if job exceeds the limit — abort — ) 

Relative priority 

I/O list for files to be accessed with explicit I/O 

The monitor allocates Backing Store for the I/O lists and queues the remainder of the message in a 
sixteen-job queue (maximum allowed is 256 jobs, but that appears to be excessive). When the job 
in progress completes, die monitor initiates the roll-out of that job and examines its job queues 
(including diagnostics that might be invoked on periodic schedules). The job with the highest priority 
that will fit the available memory (in the event that the FMP is operating in degraded memory mode) 
will be rolled in: 

• Continguous block space must first be allocated to the job coming in. This may involve 
collection of disparate groups of blocks that have become diffused in the Bacldng Store 
during explicit I/O or small job executions. 

• The file/files containing the job data to fill the block space are physically defined by the I/O 
lists. The lists for the incoming job are pointed to by the queued job request held in 
monitor’s personal area in the 65,536 main memory block reserved for monitor. Monitor 
transmits this pointer to the PDCs on the I/O channel which then access the lists and perform 
the data loading functions. Prior to initiating the I/O action, monitor sets all affected 
blocks ‘busy’ in the Backing Store map table. 

• As blocks are loaded by the PDC, it returns a response to the monitor which verifies that 
that portion of the operation was properly completed, and clears the map table busy 
flags. Upon completion of the last block, the monitor sets a ready flag in the job queue. 

• When a job is in the ready state, monitor transmits a message to the FEPs indicating which 
job has been staged to the Backing Store, along with a time stamp. This permite the FEPs 
to maintain an -audit trail of the progress of a given job. 

• When the job in the CPU completes execution, its entire Main Memory contents are swapped 
to its Backing Store blocks. The job in ready state is then swapped into Main Memory, the 
Backing Store RA+FL (reference base address and field length) register set, the monitor interval 
timer set with the job time limit and an exchange operation initiated which puts the job into 
execution. 
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• As the exchange operation is initiated the monitor sends another time-stamped system message 
to the FEPs to alert them that the job is now executing. 

• When the job completes and has been swapped to the Backing Store, monitor alerts the 
PDC, gives it the I/O list pointers, and thus initiates the rolI-:Out operation after setting all 
the respective Backing Store blocks busy. 

• On job completion, an end-of-job is transmitted to the FEPS. 

• When the roll-out is completed, monitor sets all relevant blocks not busy, updates its own 
chart of available memory, sends a final roll-out time-stamped message to the FEPs, and 
examines its jobs waiting queue for the next job to be preloaded into the Backing Store. 


PDC Communication with the Monitor 

Upon completion of requested actions by the PDC, it loads 64 bits of software-defined status information 
into the channel status word. The monitor periodically pools each channel with an 02 instruction 
(Transmit (R) to Channel (S) and Channel (S) to (T)). 

The software definition of the included bits in the status word provides for valid and error terminations 
of the Input/Output request, as well as other information. For example, the monitor might request that 
it be informed of the completion of each block transfer accomplished in a multi-block input/output 
exchange. The PDC would then respond by updating a status word for each transmission. 

Since a block transfer at 200 megabits is completed in about 14 milliseconds, the polling rate of once 
each 100 microseconds would allow monitor a fairly refined scan of the progress of input/output it has 
requested. 
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User/Monitor Communications 


All communication between the user job in execution and the FMP monitor is through a message 
structure identical to the format and style given in the section on Messages, Structure, and Discipline 
above. Two methods are used by the user job for pointing to the messages: 

1. Direct — The user job executes an 09 (Exit Force) instruction. The 64-bit quantity 
immediately following the Exit Force instruction contains a hexadecimal FF in the 
leftmost eight bits (bits 0 through 7), indicating that a memory address is contained in 
bits 35-63 of that word. Monitor will fetch this word and use the address to acquire the 
message. All messages must be stored in Main Memory. Instruction execution will be 
continued following the 64-bit word carrying the direct address, after monitor has responded 
to the message. 

2. Indirect — the user job executes an 09 instruction. The 32-bit quantity immediately 
following the Exit Force instruction contains all zeros in bits 0 throu^ 8 indicating that 
the rightmost eight bits of the 32-bit quantity contain the register designator of the 

register containing the address of the message. Instruction execution continues following the 32-bit 
pointer quantity, after monitor has responded to the message. 

After interpreting the message and taking appropriate action, the monitor executes an Exit Force (09) 
back to the job to restart it at the point it performed its job to monitor exchange (with its own 
Exit Force instruction). 


Messages 

A basic set of messages are required for assisting the executing job: 

® Read N blocks from file XXX sequentially into Backing Store beginning at address 
AAAAAA from current file position. All transfers are in blocks of 32,768 words. 

• Write N blocks to file XXX sequentially fi’om Backing Store beginning at address AAAAAA 
from current file position. All transfers are in blocks of 32,768 words. 

• Rewind file to beginning block. 

• Skip forward file N blocks. 

• Close file (note that the user may not open any files. Thus the Gose operation essentially 
locks out the file from further use during this job execution. 

• Give file to user UUUU with password PW (used' to release. an explicit I/O file to tasks on 
the front-end processors (FEPs). 

• Reduce time limit for this job to TT seconds. 

• Reduce Backing Store allocation for this job to/by BB blocks. 

• Send message to user UUUU. A user job must be logged on within one of the other processors, 
and enabled to receive messages. The user I.D. is a 16-bit quantity with all I.D.S greater 

than 8000 (hexadecimal) reserved for system tasks, and all I.D.s 7FFF (hexadecimal) and 
below assigned by the various application programmers. 

• Enable message receipt — .all other users/users UUUU, YYYY, ZZZZ only. Execute message 
processing program at job address AAAAA, save cmrent execution address in job re^ster 01 
(data flag branch address). 
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• Disable message receiving - all users. 

• Set error processor address at job address AAAAA. 

® Suspend job temporarily (roll out job until external- actions reactivate job). 

• Job complete. 

• Job abort, dump following areas of job memory to system error file (normally contains error 
messages and enor parameters). 


Exception Handling 

Monitor validates the format and syntax of all messages passed from the user. In the event that a message 
is invalid in these areas the job is aborted, the system error log updated, error messages are sent to the 
FEPs, and the next job is initiated. 

If the function requested is not in the table of monitor capabilities it is automatically passed onto the 
networic trunk (see Distribution of Functions under System Phdosophy above). The message response area 
in monitor for this message is set with a real time value from the current clock. After a delay of PPPPP 
seconds, if no response is returned, the message is considered unachievable. In this instance the monitor 
then enters the job error processing program (if that exit has been set by the appropriate message), or 
aborts the job (if the job did not set an error exit address). 

If the message is responded to but is not achievable due to system errors or a resource being “down”, 
the monitor elects to execute the job error processor (if set up) or aborts the job. 

In all cases the time of message transmission, response and enor conditions are recorded in the master 
system log, and the system error log (maintained by the FEPs) is updated. In the event of a continued 
failure of the system to achieve a particular fimction or message type, the monitor will suspend operation 
and alert the FEPs and the rest of the system. 

Monitor/System Coimnunications 

Monitor can communicate with the outside world via two avenues: direct control of the input/output 
PDCs (programmable device controllers) with word-oriented messages using the monitor mode 02 (Transmit 
(R) to Chaimel (S) and Channel (S) to (T)) instruction, or indirect exchange of messages using polling techniques 
via the Backing Store. Software in the PDC can interpret the contents of the 64-bit word exchanged by this means 
to perform certain functions. In some cases the rightmost 48 bits of this word will contain an address in 
Backing Store. In other cases the 64-bit data contains several fields for use by the PDC and monitor for 
communicating maintenance, diagnostic, degradation, setup, restart, and other internal functions. Three major 
types of messages are proposed; 

• Immediate — TUe entire message is contained in the one 64-bit word exchanged by the 02 
instruction. 

• Direct — '^e function code is contained in the leftmost 16 bits of the exchange word; the 
address points to the actual message in Backing Store. 

• Indirect — The function code and sequence of system commands are all contained in a list in 
Backing Store; the list contains within it the address pointer to the messages to be transmitted. 
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Messages 


• Acknowledge (receipt of incoming monitor or job message, plus time stamp). 

• Reject (cannot perform function requested because it is not in FMP resource table). 

• All job mode messages are legal in monitor mode. 

• Job suspended, rollout complete. 

• Job aborted, dayfile and error log informatioa at DDDDD and EEEE respectively. 

• Transmit dayfile information from DDDDD. 

• Transmit error log information from EEEE. 

• Transmit maintenance log information &om MMMM. 

• Job complete, rollout complete. 

• Where is file. 

• Is file open. 

• Degrade av^able Backing Store to NNN blodts. 

• ■ Qieck real time clods synchronization. 

• Perform I/O operation from I/O list at Hin 

• Load I/O list at mn 

• Reject because resource failed, or job addressed not in FMP 

'• Assign new job, queue information at QQQQQ 

Exception Conditions 

All incoming messages pass through the PDC which either plants them in Backing Store (in a block reserved for 
monitor) or sends them direct upon a 02 instruction poll from the monitor, if the monitor fails to poll after a given 
period of time, or the function requested is not performed by some set time, the PDC assumes the FMP to be 
disabled and alerts the FEPs. 

If monitor is unable to complete a critical function, such as real time clock synchroniration, or if 
messages or responses appear garbled, the PDC wfll allert the FEP. A Maintenance Control Unit function 
can then be initiated to determine the condition of the FMP, and bring everything to a halt if need be. 

If monitor is unable to get completion of messages, it will first switch to ^temate PDCs, alternate FEP 
addressees, and finally halt and alert the Maintenance Control Unit. 
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Input/Output Handling 


Input and output is controlled by conununications between the monitor and the PDC. Actual data 
transfers always take place from and to the Backing Store, under control of the PDC. The PDC provides 
addresses to the input/output interface within the Swap Unit, block counts, and function (read or write). 
Data exchange between the PDC and the Swap Unit is monitored directly by the PDC, as its own internal 
counter follows the 32-bit half-word transfers across the input/output trunk. When a 512-word block has 
been fully transferred to the PDC buffer, a trunk input/output operation is initiated (for output from the 
FMP). For input the trunk input(output fills a 512-word buffer in the PDC before the PDC to Swap Unit 
transfer is initiated. 

For file transfers the PDC receives a pointer to the input/output list for that file. The input/output list 
contains the following information: 

• Header word containing file identification 

• Open status (read,write, read/write) 

• Position pointer into the file map- of current location 

e First block of file 

• Last block of file 

•. Unit number and logical block number 

• Address of first and last entry in input/output list 

• A variable length list of entries giving the disk unit 

• Disk block address and number of consecutive blocks for this segment of the file 

A file is then a collection of data that may be spread over a number of disks, in noncontiguous chunks. 
The file map is kept with the file on the disk and transferred to Backing Store by the FEP (Front-End 
Processor) which opened the file for the executing job. Disk unit addresses are 16 bits long with all 
addresses of 8000 (hexadecimal) or larger specifying that the disks are multiple units using multiple trunks 
and, thus, multiple PDCs for the parallel transfer of data. In the case where monitor detects a multiple 
disk drive file (alternate blocks on each drive) a PDC will be alerted for each trunk, and a separate file 
list pointed to for each PDC. The set of file lists, either one (for one disk transfer), two (for two parallel 
disks transfering alternate blocks), or four (a maximum of four disks can be transferring simultaneously), 
are separted into individual input/output lists which define the file. In degraded mode a single PDC can 
alternate through the lists, transferring first one block from one di^, the second from another, and so on. 

File size is defined by the FEP during job assembly time, and may not be extended by the FMP job or 
its monitor. 

Input and output of data can be carried on with other system components in the same manner as the 
disk system, however, the data transferred need not be in minimum units of 32,768 words, since many 
attached processon need not contain that much buffering. Addresses in the file list below 4000 (hexa- 
decimal) designate other components on the trunk (such as the FEPs, or low-speed devices). The command 
word sent to the PDC from the monitor in such cases carries a count of words to be transferred rather 
than blocks of words. 
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Maintenance Interface 


The Maintenance Control Unit (MCU) communicates with the FMP over any one or more designated channels of 
the network trunk. Any PDC can have the particular address and password loaded by its software at 
autoload time permitting it to accept MCU messages, and toggle the special maintenance control bits 
provided in the FMP CPU (see Section 3.6, FMP Functional Computer Specification). These control lines 
provide hardware level control and monitoring capability for any permitted processor on the trunk. 

Special Lines 

The FMP Functional Specification defines a set of lines (established by the PDC acting as the MCU 
interface) whose function is to control degradation, configuration, and activity of the FMP. Lines such 
as ‘stop’, ‘disable instruction overlap’ are needed for system failures and during scheduled maintenance. 

The FMP monitor must contain the capability to exercise the options defined by these lines. Until more 
detailed CPU design is completed these special functions must be limited to their STAR-100 counterparts 
discussed in Section 3.6.1. 


Messages 

In addition to the messages discussed in Monitor/System Communications above, the monitor can issue a 
set of privileged messages to the MCU processor, via the PDC. Included in this set of messages would be: 

1. Disable instruction overlap — vector, map, swap, or scalar 

2. Disable SECDED error check, leave syndrome bits unmodified (used during diagnostic operation) 

3. Force SECDED error on trunk KK (diagnostics only) 

4. Help! (Undefined ailment, illogical combination of events discovered by monitor.) 

5. New job to be initiated, rotate assignment of pipelines. ' 

Degradation 

The only mode of degraded operation permitted for the FMP is a reduction of the Backing Store hardware 
to a minimum of 32-million words and a reduction in the Main Memory to a minimum of two-million 
words. This degraded mode permits the maintenance of memory units off-line, since they are powered 
and cooled in those minimal unit quantities. Memory configuration is specified by a set of control bits 
preset by the MCU in a message to the appropriate PDC when the FMP is in the master clear state only. 
For purposes of addressing changes when in degraded mode, the memory is first divided into upper and 
lower (with each half being able to be the lower half-memory in degraded mode). Thus the first level of 
degradation consists of cutting the available memory in half, chan^g the apparent physical addresses to 
take the healthy half of memory into the lower address space and locking out address references to the 
sick half memory. 

In this mode the maintenance station can enable memory references by the monitor or selected PDCs to 
provide diagnostic facilities at various clock rates. 
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The Scalar Processor is capable of operating with the Vector and Map Units disabled, and even powered 
off. 


THE SYSTEM FUNCTIONS 

The basic approach in the FMP Operating System is to rely heavily on the utilization of fonctions already 
implemented in existing software and operating on existing computing hardware. To meet this objective 
in the time frame established for the FMP installation requires the minimum disruption and redesign of the 
standard operating system components. This is accomplished by three techniques: 

a. Programming of the PDC to simulate interfaces already known by the existing software 
systems. Basic OS drivers can remain intact in most instances. Machines of alien 
architecture to each other but with massive entrenched software can be interfaced for 
relatively small cost. 

b. Constraint of the number and complexity of functions required by the FMP, and 
making the FMP responsible for any functions with extremely fast response time 
requirements (such as the reading and writing of the major data base disk systems). 

c. Providing most FMP services with job mode applications programs written in higher- 
level languages. 


Input/Ontput for the FMP 

Management of all files in the FMP system will be handled by the FEP. Data transfers between elements 
of the system and file resources will be conducted directly by those elements, however. The FEPs must 
supply the functions normally required of general-purpose computers: 

1. Open file, verify access, set access mode (read, write, read/write). 

2. Close file, dispose of files (destroy, archive, keep in place). 

3. Allocate file, and file space. 

4. Expand and contract file space. 

5. Move files. 

6. Search, and retrieve files. 

7. Assemble files from subfiles. 

8. Build file maps for FMP disk transfers. 

These functions can be provided by most known operating systems, such as the CYBER NOS and NOS/BE 
systems. By judicious programming of tiie attached PDC it should be possible for the FEP operating 
system to deal with attached peripherals (for which software is already in place) in the same manner as 
if the peripherals were directly attached in today’s customary manner. 

Interface for the FMP can be provided by an FMP message handler operating in job mode. This message 
handler can perform file searches and retrievals using the stnictmed access file software native to the FEPs, 
and transmit data to the FMP, or its high-speed disks, using only one specially developed trunk driver 
which would have to operate in the CYBER PPU. 
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Job Scheduling for FMP 


The FEP complex must perform the scheduling of sequences of jobs to be executed on the FMP. 

The decision on when and what to execute from the queue given to the FMP by the FEPs still remains 
for the FMP monitor as described in Allocation of FMP Resources above. 

For purposes of scheduling, the high performance disk system is considered part of the memory resources 
that belong to the FMP and must be allocated by the FEP. Taking the FEPs’ scheduling responsibilities 
for a job in order: 

1. Assessment of available system resources (tabulating amount and location of hi^ 
performance disk, low performance disk, central FMP memory. Backing Store, 
available terminals, archive, and jobs already queued for FMP). 

2. Choice of next job to be assembled for the FMP — based on incoming requests 
for service 

a. priority, 

b. resources required, 

c. time limit (quick interactive or full run), 

d. availability of components of the job (programs and data). 

3. Reservation of high performance disk space. 

4. Layout of contiguous job space to be rolled into main FMP memory. 

5. Layout of contiguous storage area for the job to be allocated in the Backing Store. 

',6. Building of file map for job and data files on the high-speed disks. 

7. Creation of file entity with security and modes of permitted access. 

8. Storage of file header, file map on selected disk. 

9. Movement of program file from local storage to job file. 

10. Movement of data base to job data files. 

11. Closing of data and program. files. 

12. Transmission of queue request to FMP. 

13. Logging of all the above activities. 

14. On close file from FMP, evaluate the disposition code and perform required operation 
(note that only FEPs can destroy or otherwise dispose of files). 

15. Maintain job status from submission to FMP to job completion. 

16. Provide job mode programs for accoxmting. 

17. At job mode completion, follow through on disposition codes. 
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Exception Handling for FMP 


The front-end processors provide two functional entities which deal with system exceptions: 

1. Maintenance Control Unit (MCU) processing. 

2. Front-end system management. 

Programs representing the MCU functions wili be prepared for at least the FEP computers. These modules 
will be given the MCU privileged password for communications with the FMP and control of the special 
MCU lines (channels) within the FMP CPU. These functions are defined by the MCU lines available, and 
additional monitor commimications defined by as yet to be completed design work. 

The FEP computers are responsible for recording all errors and determining what disposition to make of 
the partial data and remaining program and job data that are salvaged from aborted jobs. Restart, 
recovery and retrench ftmctions are programmed in the FEP, and the semi-automated decision to apply 
such strate^es driven by FEP programs. 

Input/Output Bundling for Other Attached Processors 

In order to maintain full control over all the system resources, the FEPs should perform all file manage- 
ment functions afforded the FMP. In some instances a particular processor may require a unique attach- 
ment to a unique peripheral which need not be attached to the trunk. In these cases management of 
the attached peripheral resource becomes the responsibility of the processor that “owns” the resource. In 
all other cases the FEP or the PDC attached to a resource acts as the resource manager. As in the FMP, 
the actual data transfers bypass the FEP and go directly between the processor and the requisitioned 
resource. 


OPERATING SYSTEM STRUCTURE AND IMPLEMENTATION 

Within the limits of the operating system implementations for the front-end processing systems already in 
existence, the FMP operating system should conform to certain architectural and implementation ground 
rules. 


Programming Language 

AH new components of the operating system shottld be written in a higher-level language. The same 
higher-level language should be used regardless of the processor being programmed. The programmable 
device controllers presently are implemented in assembly language because they operate in real-time, and 
each machine cycle must be accounted for. This situation may be unavoidable, however some effort 
should be expended to see if major portions of PDC programs could be implemented in a higher-level 
language. 

The choice of higher-level language is not as important as the decision to require the use of one at all, 
however, the language PASCAL is becoming a pseudo-standard software programming language which is 
implemented on a variety of hardware, and widely taught in computer science courses. At this point 
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PASCAL would appear to be the best choice since many language processors are being created and 
bootstrap systems produced for it. 

A dialect (if not the entire language) of PASCAL should be used as the primary programming language 
for the PDCs wherever possible. The compiler system could reside on one particular mainfirame producing 
system code for the other attached processors, including the FMP. Development of such a system, with 
attendant documentation, debugging, and system design control aids should be initiated as soon as possible. 

Modularization 

One of the outgrowths of the distributed system philosophy is an enforced modularization of the software. 
As functions are distributed outward from a central computing facility into a network of ever smaller 
computers, the functional portions become the entire program module for the distributed computers. By 
enfordng a set of message standards for communications between such programs, a rigid set of boundaries 
can be derined for each and every module in a system. 

IVhat remains is to break up all system functions into modules of like kind regardless of whether or not 
they actually reside together in a large central computer or in fragmented smaller machines. This means 
imposing message disciplines and constraints on intermodule communication, even within a common computer. 
Thus at some later date, a module could be moved to a different computer, with the identical messages 
being passed over the network instead of internally within the memory of a single computer. 

A set of module implementation specirications must therefore be created and placed in this portion of 
any software speciHcation, at a later phase in the project. 

Configuration Flexibility 

The FMP ^stem must be capable of fimctioning even in severe states of degradation. This means that, 
as a minimum, the FMP in its most degraded state of memory, at least one FEP, sufficient disk space 
to queue one job, plus one interconnecting trunk must be available for completing flow model solutions. 

The system must also be able to sustain interactive communications, and to queue jobs on mass storage 
during interruptions of service by any of the other system components such as the FMP; archival storage 
subsystem, graphics subsystem, or high performance mass storage system. 

The operating system must cope with any possible combinations of configurations arising dynamically in 
an operating day. Equipments must be able to be taken off-line, without restarting the operating system, 
and new equipments installed without reassembly, recompilation, or other massive remapping of the operating 
^stem (the exception to this would be the reassembly of a small portion of the operating system, say the 
disk driver, to accommodate a new type of disk system). In most cases of this sort, however, the PDC 
(Programmable Device Controller) is expected to insulate the bulk of the operating system firom such specific 
hardware changes. 
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Extensibility 


Two factors militate against the creation of an ideal system design and implementation on the first 
installation of the FMP complex: 

1. The time and resources available to generate the first production system make it 
necessary to be ruthless about eliminating all but the most necessary production 
system functions from the first system implementations. 

2. The operating system must be adapted to the actual production requirements as 
learned from operation of the FMP system over an extended period of time. 

Decisions such as the location and allocation of system functions according to the 
“Distributed System Philosophy” must be reconsidered as actual experience detects 
bottlenecks and resource imbalance. 

It should be obvious from past computer system experience that these factors require a high degree of 
flexibility and adaptability in the operating system, as understanding of the FMP and its participation in 
the airframe design process matures. It should be possible to redesign, recompile, load and test experimen- 
tal functional modules while the total system is on-line. Further, it should be possible to intioduce new 
functional modules into the system, while the system is on the air in production mode , without fear of 
destroying the remainder of the system. Thus if an error occurs, it can only affect the modified or new 
modules and functions. 

Without this feature it can be expected that, although the hardware availability might be high due to 
extensive reliability engineering and the basic software might be reliable for the same reason, new software 
development requires so much dedicated system time that actual customer availability is severely affected. 
This is of particular concern when it is realized that certain components in the system (such as the FMP) 
are one-of-a-kind, as well as the total configuration being unique. Thtis final debusing and testing would 
have to be done on the actual FMP complex. This consideration alone is so important it must be stressed 
with utmost vigor, as experience on the STAR-100 systems has shown. Since few STARs are available for 
software checkout, the STAR data center has become a major test and integration facility, substantially 
reducing the system availability to general customer utilization. 

The operating system architecture then must be defined with this goal in mind, and thence the manner 
and means of modularization and message implementation can be dictated. 

RAM (RELIABILITY, AVAILABILITY AND MAINTAINABILITY) 

The software system for a complex as large as envisioned for the FMP installation is a major factor in 
system reliability and availability as is the hardware. A detailed specification of the RAM requirements for 
each software subsystem must be developed for the FMP as objectives for the initial and final production 
operating systems, and be included in this portion of a software specification. Two items requiring special 
care and attention are documentation and stability. 

Documentation 

Operating systems \vith many independent nodes, and a variety of functional modules require extensive 
documentation in excess of the commonly used listings, flow charts, and module descriptions. There must 
be established a set of documentation standards which engage the following issues: 
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1. Extensive definition of all terminology in .an alphabetic glossary (for example the term 
“message” must be rigorously defined and each of the fields in the message must be 
defined in terms of their meaning). 

2. A master outline of the operating system documentation must be created with a place 
for every single piece of system documentation (including listings of the source language, 
job control setups to form the loaded operating system, and any other preparation details). 

3. A set of required documentation rules to be used for imbedding thought as well as 
technique in the program listings is required. Thus management ground rules are needed 
such as “all term names must be fully spelled out in source language symbols, at the 
cost of more typing time at a terminal or more keypunch strokes” (phrases l&e “message 
length” are more desirable than “MSGLNG” for readability). 

4. A set of programming ground rules which aid documentation and future comprehensibility 
must be established and enforced. The structured programming schemes aided by the 
structure of PjWCAL are one major example of an enforced technique. 

5. A theory of operation manual must be prepared for each component of the distributed 
system, as an overview 'of the total system. 

6. A message dictionary (with every form and every function described) must be created 
for the whole system. 

7. A master library (preferably automated to make updates timely) must be established for 
current, and past, venions of the source code, flow charts (or flow descriptions), and all 
versions of the software objectives, design and test objectives documentation. 


Stability 

A detailed breakdown of the stability requirements for each software module must be developed before the 
system implementation commences. That spedfication would' then be placed here. The overall requirement 
is that the total software system (including operating system, compilers, and utilities) must have a mean 
time to failure no worse than the hardware system. 

Obviously a perfect system would be desired, however practical experience shows that some degree of 
instability will remain in a system as long as new appUcations are submitted to it and, in particular, if 
new or modified functional modules of the operating ^stem continue to be introduced throughout the 
lifetime of the system. 
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TO 

(T) 



96 

4C 

4 

32 

RG 

OIV U? (R)/(3) 

TO 

(T) 



97 

40 

6 

32 

IN 

HALF WORD ENTER 

R 

WITH 1(16 BITS) 

97 

4E 

6 

32 

IN 

HALF WORD INCREASE R BY 1(16 BITS) 

97 

4F 

4 

32 

RG 

DIV S; (R)/(S) 

TO 

(T) 



97 

50 

A' 

32 

RG 

truncate; (R) TO 

(T) 



98 

51 

A 

32 

RG 

floor; (R) TO (T) 




98 

52 

A 

32 

RG 

ceiling; (R) TO 

(T) 



99 

53 

A 

32 

RG 

significant SQUARE ROOT; 

(R) TO 

(T) 

99 

54 

4 

32 

RG 

ADJUST significance; (R> 

PER (S) 

TO 






(T) 





iOG 

55 

4 

32 

•RG 

ADJUST exponents; 

(R) PER 

(S) TO 

(T) 

101 

56 

7 

64 

SM 

BSWAP; R— >S or 

S' 

-->T 



102 

57 




ILLEGAL 





102 

58 

A 

32 

RG 

transmit; (r) to 

(T) 



102 

59 

A 

32 

RG 

absolute; (R) to 

(T) 



102 

5A 

A 

32 

RG 

exp; (R) to (T) 





102 

5B 

4 

32 

RG 

pack; (r), (s) 

TO 

(T) 
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TABLE OF CONTENTS iCont.) 

.r — Function Cade 

J ■ Format. lyoa 

i . \ 

I I Number of Bits in Operand 

i i I 



J 

1 

J 

__ _ 

Tncitruction Type 


Page 

1 

J 

V 

1 

1 

V 

1 

t 

V 

1 

1 

V 

Name of Instruction 

V 


103 

5C 

A 

B 

• RG 

extend; 32-bit (R) TO 64 

-BIT (T) 

103 

50 

A 

B 

RG 

INDEX extend; 32-bit (R) 

TO 64-bit (T) 

103 

5E 

7 

32 

NT 

load; m per (s), (r) 


103 

5F 

7 

32 

NT 

store; m per (S), (r) 


lOif 

60 

4 

64 

RG 

ADD u; (R>+(S) TO <T) 


104 

61 

4 

64 

RG 

add l; (r)+(S) to (T) 


104 

62 

4 

64 

RG 

ADO N; (R>+(S) to (T) 


104 

63 

4 

64 

RG 

ADD address; (R)+{SI to 

(TJ 

104 

64 

4 

64 

RG 

SUB u; (R)-(S) to (T) 


104 

65 

4 

64 

RG 

sub l; cr)-(s) to <t) 


104 

66 

4 

64 

RG 

SUB n; (R)-{S) to (T) 


104 

67 

4 

64 

RG 

sub address; (r>-(s) to 

(T) 

105 

68 

4 

64 

RG 

MPY u; (R)*<S) to (T) 


105 

69 

4 

64 

RG 

MPY L; (R)*IS) to (T) 


105 

6A 




ILLEGAL 


105 

6B 

4 

64 

RG 

MPY s; (R)*(S) to (T) 


105 

6C 

4 

64 

RG 

DIV u; (R)/(S) to (T) 


105 

60 

4 

64 

RG 

INSERT bits; (R) TO (T) 

PER (S) 

106 

6E 

4 

64 

RG 

EXTRACT bits; (R) TO (T ) 

PER (S) 

107 

6F 

4 

64 

RG 

DIV s; (R)/{S> TO (T) 


107 

70 

A 

64 

RG 

truncate; (R) to (T) 


108 

71 

A 

64 

RG 

floor; (r) to (T) 


108 

72 

A 

64 

RG 

ceiling; (r> to (t> 


109 

73 

A 

64 

RG 

SIGNIFICANT SQUARE ROOT-; 

(R) TO (T) 

109 

74 

4 

64 

RG 

adjust significance; (R) per (S) to 






(T) 


110 

75 

4 

64 

RG 

ADJUST exponent; <R) per (S) to (T) 

111 

76 

A 

B 

RG 

contract; 64-bit (R) to 

32-BIT (T) 

112 

77 

A 

B 

RG 

ROUNDED contract; 64-BIT 
32-bit (T) 

(R) TO 

112 

78 

A 

64 

RG 

transmit; (r) to (T) 


112 

79 

A 

64 

RG 

absolute; (R) to (T) 


112 

7A 

A 

64 

RG 

EXP.; (R) TO (T) 


112 

7B 

4 

64 

RG 

pack; (r>, (S) to (T) 
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{ 


1 ] 


No. 

1/ 

V 

V 

V 

V 

113 

7C 

A 

64 

RG 

LENGTH? (R) TO (T) 

113 

70 

7 

84 

NT 

swap; S — >Tt R — >T 

llA 

7E 

7 

64 

NT 

LOAG? ?T) PER (S>, {RJ 

llA 

7F 

7 

64 

NT 

store; (T) per (S), ir) 

114 

BO 




ILLEGAL 

114 

Si 




ILLEGAL 

114 

32 




ILLEGAL 

114 

83 




ILLEGAL 

11- 4 

34 




ILLEGAL 

114 

85 




ILLEGAL 

114 

86 




ILLEGAL 

114 

87 




ILLEGAL 

114 

88 




ILLEGAL 

114 

89 




illegal 

114 

8A 




ILLEGAL 

114 

SB 




ILLEGAL 

114 

8C 




ILLEGAL 

114 

80 




illegal 

114 

8E 




ILLEGAL 

114 

8F 




ILLEGAL 

114 

90 




ILLEGAL 

114 

91 




ILLEGAL 

114 

92 




ILLEGAL 

114 

93 




illegal 

114 

94 




ILLEGAL 

115 

95 




ILLEGAL 

115 

96 




ILLEGAL 

115 

97 




ILLEGAL 

115 

93 




ILLEGAL 

115 

99 




ILLEGAL 

115 

9A 




ILLEGAL 

115 

9B 




ILLEGAL 

115 

9C 




ILLEG AL 

115 

90 

D 

e 

SN 

STREAM MAP 

124 

9E 

0 

B 

SM 

BUFFER REAOy WRITE SETUP 

123 

9F 

E 

E 

SN 

VECTOR ARITHMETIC 
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131 

AO 

131 

Al 

131 

A2 

131 

A3 

131 

A4 

131 

A5 

131 

A6 

131 

A7 

131 

A8 

131 

A9 

131 

AA 

132 

AB 

132 

AC 

132 

AD 

132 

AE 

132 

AF 


ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 

ILLEGAL 


132 

BO 

C 

E 

8R 

index; BRANCH IF 

(A) + (X) 

EQ 

(Z) 

132 

81 

C 

E 

8R 

index; branch if 

(A)+(X) 

NE 

(Z) 

132 

B2 

C 

E 

BR 

index; branch if 

(A)+(X) 

GE 

(Z) 

132 

83 

C 

E 

SR 

index; BRANCH IF 

(A) + (X) 

LT 

(Z) 

132 

84 

C 

E 

BR 

index; BRANCH IF 

(A) + (X) 

LE 

(Z) 

132 

B5 

C 

E 

BR 

index; BRANCH IF 

(A) + (X> 

GT 

(Z) 

139 

86 

5 

NA 

BR 

BRANCH TO IMMEDIATE ADDRESS 

(R) 






+ I (48 BITS) 




139 

B7 




ILLEGAL 




139 

B8 




ILLEGAL. 




139 

B9 




ILLEGAL 




139 

BA 




ILLEGAL 




139 

B8 




ILLEGAL 




139 

BC 




ILLEGAL 




139 

BD 




ILLEGAL 




139 

BE 

5 

64 

IN 

ENTER R WITH 1(48 

BITS) 



140 

8F 

5 

64 

IN 

INCREASE R BY 1(48 BITS) 
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\t 

1 
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V 
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V 

140 

CO 




ILLEGAL 

140 

Cl 




ILLEGAL 

140 

C2 




ILLEGAL 

14Q 

C3 




ILLEGAL 

140 

C4 




ILLEGAL 

140 

C5 




ILLEGAL 

140 

C6 




ILLEGAL 

140 

C7 




ILLEGAL 

140 

C8 




ILLEGAL 

140 

C9 




ILLEGAL 

140 

CA 




ILLEGAL 

140 

C3 




ILLEGAL 

141 

CC 




ILLEGAL 

141 

CD 

5 

32 

IN 

HALF-WORD ENTER <R) WITH 1(24 BITS) 

141 

CE 

5 

32 

IN 

HALF-WORD INCREASE (R) BY 1(24 BITS) 

141 

CF 




ILLEGAL 

141 

00 




ILLEGAL 

141 

01 




ILLEGAL 

141 

02 




ILLEGAL 

141 

03 




ILLEGAL 

141 

04 




ILLEGAL 

141 

05 




ILLEGAL 

141 

06 




ILLEGAL 

141 

07 




ILLEGAL 

141 

08 




ILLEGAL 

142 

09 




ILLEGAL 

142 

DA 




ILLEGAL 

142 

D8 




ILLEGAL 

142 

DC 




ILLEGAL 

142 

DO 




ILLEGAL 

143 

OE 




ILLEGAL 

143 

OF 




ILLE,GAL 
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ILLEGAL 

143 

EA 
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ED 

ILLEGAL 
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EE 

ILLEGAL 
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EF 

ILLEGAL 

144 

FO 

ILLEGAL 
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FI 

ILLEGAL 
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F2 

ILLEGAL 

144 

F3 

ILLEGAL 
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F4 

ILLEGAL 

144 

F5 

ILLEGAL 

144 

F6 

ILLEGAL 
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F7 

ILLEGAL 
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F8 
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F9 

■ ILLEGAL 

144 

FA 

ILLEGAL 

144 

FB 

ILLEGAL 

144 

FC 

ILLEGAL 
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FD 

ILLEGAL 

144 

FE 

ILLEGAL 

144 

FF 

ILLEGAL 
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1.0 SCOPE 

This specification for the CDC FLOW MODEL PROCESSOR 
(FMP) is to be used in conlunction with the 
CDC STAR-iOO Computer Specifications. It 
is assumed that the reader is. familiar with the 
concepts and terminology described in those 
documents. 

• This is HOT a reference manual for user’s groups. 
This document is written expressly for logic 
designers and diagnostic programmers. 


2.0 APPLICABLE DOCUMENTS 

10354637 . CDC FLOW MODEL PROCESSOR Functional 

Computer Specification 


3.0 PERFORMANCE REQUIREMENTS 


3.1 General Description 


SStAr . ™ OF THE 
OTOBUL PAGE IS POOE 
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3.1.1 

Instruction 

Formats 

and Types 

3 . 1 • 1 . 1 

Instruct ion 

Formats 

- all fie 


otherwise specified 

• 

3.1. 1*1.1 

N/A 



3. 1.1.1. 2 

N/A 




N/A 



3. 1.1. 1.4 

Format 4 



1 F' I 

R 1 S 

1 T 

1 

1 Func t i on {Source 1 Source 1 Des 1 1- 

I 

i 1 

1 » 2 

{nation 

1 

1 ... .1 

1 

\ 

_1 

3.1. 1.1. 5 

Format 5 



1 F 1 

R 1 




reproducibility op T5E 

ORIGINAL PAGE IS POOR 


IFunctionlDesti- 
I Ination 
1 I 


148 


3.1* 1*1^6 Format 5 


1 F 1 

R t 

1 

1 Functi on 1 

• Desti- { 

116 1 

1 i 

nation 1 

1 

1 { 

1 

I 

3. 1.1. 1.7 

Format 7 

. 


{ F 

{ 

R 

1 

S 

1 

T 

{ 

1 Function 

I 

* 

{ 


{ 


1 

{ 

1 


1 


{ 


I 

1 

t 

1 


1 

1 


1 


1 


"'^Described where used 
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3. 1.1. 1.8 N/A 

3. 1.1. 1.9 Format 9 


1 F 

{ G IS 

1 

T 

} 

1 Funct ion 

1 Sub- 1 * 

1 

♦ 

1 

1 

1 Funct ion 1 

1 


1 

1 

1 I 

I 


L 

3.1.1.1.10 

Format A 





1 F 

1 R 

1 

1 T 1 

1 Funct ion 

IRegisterl 

1 Register 1 

1 

1 

1 

t 1 

1 

J 

_J 

1 1 

3.1.1.1*11 

Format B 





1 F 

1 G 1 { 

1 T 

I 

1 Funct i on 

1 .Sub- I 

{ Base 

1 

1 . 

IFunctionI 1 6 

1 Address 

1 

J 

I 1 I 

1 

1 


Described where used 

Unused areas must be 
cleared to zeros 


3.1.1.1.12 Format C 
0 4567 


I F inni X I A lYI B I Z I C I 
IFunctionl^J 11 S IRegisterIRegisterl Index I Base I Register IRegi ster I 
I I INN I I {Address! I I 


* Unused area must be cleared to zeros 
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3.1.1.1.13 Format D 


t F J^IPCl Parcel ±\ 

IFunctionJ \ I t. 

I Ulif I 16 I 




* Unused area must be cleared to zeros. 
3. 1.1.1. lA Subformats 
3.1.1.1.14.1 Format Ol Parcel 16 Bits 
0 1 2 3 4 5 6 7 8 9 IQ 11 12 13 14 15 
1 A I8SCI 0 ! E I 


3.1.1*1.14.2 Format 02 Parcel 64 Bits 

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 >31 

! A SBICS 01 Z ■ I E I 

32 ' >59 60 61 62 63 

I F I G I 


3.1.1.1.14.3 Format D3 

Parce 1 

16 Bits 

0123456789 lO 

11 12 

13 14 15 

1 A I B ! 

C 

1 
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3.,1 • 1 . 14 . 4 Format O 4 Parcel 16 Bits 

0_i_ _2. _3_' 4- 5 ^6- -7- 8 -9-1-0-11* 12r 13" 14-15 
I A I 8 I C I D I E S 


3«1.1«1»14»5 Format O 5 Parcel 32 Bits 
0 1 2 3 4 5 6 7 8 9 lO 11 12 13 14 15 16 17 18 19 
I A S8ICI D 1 E 1 

20 Z1 22 23 24 25 26 27 28 29 30 31 
! F 


3»1.1»1.15 Format E 

0 7 8 15 
I A 1 6 t 

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
I Cl D 1 E I F I G1 HI Jl Kl LJ Ml N 1 



ICONTROL DATA I ENGINEERING NO. 10354636 

{ j OAYE; Qec. 1977 

I Corporation I SPECIFICATION PAGE l9 

A 

R A D L 

3. 1.1.2 Instruction Types 

3«1. 1.2.1 Register Instructions (RG) 

In the register instructions, all operand sources 
and all result destinations are registers. R, S, 
and T each designate the contents of one of 256 
registers. 

A register may be used to hold one or both source 
operands as well as the result. Special casef if 
register qq is designated as a source or result 
register, see Section 3.1.7. 

Unless stated differently in the Instruction 
descriotion In all register-to-register operations, 
the contents of the source registers are unchanged 
and the destination register is cleared before the 
result Is transferred into it. 

3. 1.1. 2. 2 Index Instruct ions .( IN) 

The index instructions are used primarily in 
performing numerical calculations on field lengths 
and addresses. 

The term, rep I ace « means replace only the specified 
bits. The phrase, replace the riaht-most hS bits ..., 
implies that the left-most 16 bits are not altered, 

3. 1.1. 2. 3 Branch Instructions (SR) 

Branch conditions may be determined by examining 
single bits, a 48-bit index, 32“bit floating-point 
operands or 64-bit' f I oating-oo int operands. A 
special branch is provided to enter and leave the 
monitor program. Al I item counts in branch 
instructions are in half-words. 
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3*1»1«2«A Stream Instructions (SM) 

Stream instructions ooerate on ordered sets of data* 
executing in either the Swap, Map, Buffer, or Vector 
Units, A set of stream instructions consists of a 
32'“bit header packet (see format D, 3«l«l,l,13) and a 
variable number of 32 ~bit packets containing oarcels 
of data to be transmitted from the Scalar Processor 
to one of the three stream units (Vector, Map, Swap), 
Packets are fixed length (32“bits) while parcels are 
16“bit, 32-bit, or. 6A“bit length. The header and its 
associated oackets of instructions constitute a form 
of high level micro-code for the particular function, 

, Referring to format D under section 3.1.1*1«13, the 
stream designators are defined as follows! 

F - Eight-bit instruction code 

PC - Four-bit packet count specifying the 

number of 32-bit packets fol lowing the 
header packet. Note, that a count of zero 
implies that the entire stream instruction 
is contained in the first' 32-bit packet 
containing the header. 

The various sufaformats for the m i cro instruction 
parcels are given in section 3.1,1.1.14, formats Dl 
through 05. The fields take on meanings dependent 
upon their use in a given parcel, for a given unit. 

In general, the field designators are used as 
fo I I ows! 

A 


B 


- Subfunction field (for example, Ri 
SETUP) . 

- Reference mode (immediate for 
addresses imbedded in the instruction, 
or indirect for addresses in the 
specified registers in the Scalar 
Processor register file), 

- Word size (32-bit or 64-bif). 


NO. 10354636 
DATE Dec. 1977 
PAGE 20’ 

REV. A- 


(continued) 
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3*1.1*2.A (Cont.) 

0 - Extension code* source field, or 

length field depending on the 
instruction. 

E - Register file pointer, length field, 

base address, or source field 
depending on the instruction. 

F - Memory address in M.ap and Buffer 

instructions, source field in V^ector 
Unit instructions. 

G - Lower address bits (shift count) for 

Map instructions, round flag in Vector 
Unit instructions. 

H,J - Complement’ flag for B and 0 trunks in 

Vector Unit. 

K - Nul I f iel d. 

L,M,N - Vector Unit result bus select. 


Z 


Unused 
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3. 1.1. 2. 5 N/A 

3. 1.1. 2. 6 N/A 


3. 1.1. 2. 7 N/A 


3. 1.1. 2. 8 N/A 

3. 1.1. 2.9 Monitor Instructions (MN) 


Monitor instructions perforin as described only when 
in monitor mode. When not in monitor mode, the 
monitor instructions perforin as an illegal 
instruction Mould (see Section 3. 1.4. 2. 2). 

3.1.1.2.10 Non-Typical Instruction (NT) 


The format and operation of these instructions are 
completely described under the individual instruction 
write-ups. 
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3.1.2 Addressing 

Groups of bits in an address should be thought of as 
addressing various unitsof storage as illustrated 
in the chart below. 


bit position 
in a register 
or an in- 
struction 
word 


OEIGTNAL PAGE 


16 > 54 55 56 57 58 59 60 61 62 63 



\ \ — 

\ \ 

1 1 I 

1 

i 

1 

till 



1 

I 

1 

1 

f 


Address of 

1 

1 

1 

1 

1 

1 

1 

1 

<-- 

Sword--- 

— >1 

I 

1 

1 

< -- 

Address of 

Word-- — --- 

i 

-> \ 

! 

1 

1 

< — 

Address of 

Ha 1 f- Word — 

— 

1 

->! 

1 

1 

< — 

Address of Byte- 

• « w 


1 


< 


Address of Bit 


> 1 
I 


Within a word, bits, bytes, and half-words are always 
numbered from left to right. The lowest addressed 
bit, byte, or half-word is always the left_^most bit, 
byte, or half-word in the word. 

AM addresses are 48-bit Quantities and contain 
enough information to reference a specific bit. 
Depending on the usage of an address, a certain 
number of the right-most bits in the address are 
ignored. For example, if a byte is being read, the 
right-roost three bits of the address being used to 
reference it are ignored. Depending on the 
instruction, operands are counted on a bit, byte, 
half-word or word basis. 


I <-- -half-word fl — ---half-word l — >J 

I I 1 


I I I I I I I I I 

Ibyte 01 byte il byte 2 ibyte 3 tbyte 4 lbyte 5 Ibyte 6lbyte 71 


bit 0 78 15 16 23 24 31 32 39 40 47 48 55 56 63 


The above figure illustrates the relative location of 
each bit, byte and half-word within a 64-bit word. 


(cont inued) 
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3.i«2 (Cont.) 

If it is necessary to add addresses and item counts 
(indices or offsets), the item count is shifted left 
end off until it is properly aligned with the 
address. Binary zeros are attached to the right end 
of the quantity being shifted. 

The result of the addition always addresses a 
quantity having the same unit as the item count, for 
instance, if a byte count is added to any address, 
the result references a byte. This means that the 
right-most three bits of the address will be ignored. 
The following chart summarizes the process of adding 
an item count to an address and shows which bits are 
ignored in the resulting address. 

(cont inued) 
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3.1.2 (Cont.) 16 57 58 59 60 61 52 63 

Base Address >I I ] I I t I t 


! i 



/ 

16 

22 

I 

631 


A. words 1 ** 

1 words 

1 

item 
counts 
(indices 
or off- 
sets) / 
\ 

16 

21 


B. half- 1 ** 

1 ha 1 f-words 


C. bytes 

16 19 



1 1 bytes 



t 


0 0 0 


0 0 0 


0 0 0 ' 


{0. bits 
! 

\ 

REPRODUCIBILITY OE THE 
ORIGU-'AL PA^JE IS POOR 



lA. words 

1 

resul t- 

1 

1 

1 

ant 

13. half- 

address- 

! words 

es 

I 


!C. bytes 

1 

t 


1 

ID. bits 


\ 


16 

I b its 


63 


16 


57 58 59 60 61 62 63 


I 


1 * * 

I f I 


I i 


i } 

< — -Bits used-> <-- * 

i I 

r 

<-— Bits used > < * -> 


I 


< Bits used- 


> <- * -> 


■Bits used- 


* These bits in the resultant address are ignored. 

** These bits in the index or offset are shifted off and do not 
enter the address calculation. 
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3. 1.2.1 Memory Hierarchy Addressing 

There are four levels of memory accessible to the 
orogrammerJ Register File? Vector Unit Buffer (VUB) » 
Main Memory and Backing Store. Each of these 
memories can be addressed by instructions in two 
different ways! direct and indirect. 


3. 1*2. 1.1 Direct Addressing 

Memory addresses can be contained within the 
instruction itself? and are therefore called direct 
addresses. In the case of the Register File? such 
direct addresses are called register designators? 
each designator being assigned a name such as R? S? 
or T. A direct register file reference in an 
instruction can access any one of 256 registers 
(64“bit or 32~bit). 

Vector Unit Buffers can contain from one to four 
thousand words each. Thus a field of twelve bits is 
established fsee formats Di through 05) for the 
insertion of the buffer address. 

Main memory addresses permit accessing up to 128 
million 64 ~bit words? thus 27 bits are 
established as the direct address field for this 
format . 

Backing store references always access data in 
32?768 64-bit word blocks. Twenty-seven bits of 
direct address field are allocated to permit 
referencing up to 128 million of these blocks? or 
12 

4.4 X 10 words of data. 

The actual amount of physical memory present is 
determined by the specific machine configuration. 
Memory not actually in existence causes a data flag 
branch to occur at the time of reference. 
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3.1.2»i«2 Indirect Addressing 


The memory addresses can also appear in 64-bit 
registers in the Register File. - Thus an instruction 
can reference memory indirectly by giving the 
appropriate register file address* which points to 
the register containing the desired memory address. 


The allocation of address 
registers are as follows: 

o Register file address 


o Vector buffer 
unit addresses 


o Main memory addresses 


o Backing store memory 


its in the designated 


-- rightmost 8 bits of 
the indirect register 
(bits 56-63). This 
allocation is used only 
for the SWAP (70) 
instruct ion. 

— all addresses are bit 
addresses* thus the 
rightmost 21 bits of 
the indirect register 
are used (bits 43-63). 

— all addresses are bit 
addresses* thus the 
rightmost 33 bits of 
the indirect register 
are used (bits 31”63). 

— all addresses are bit 
addresses* thus the 
entire 48 -bIt address 
is used. 


3*1. 2*1*3 Illegal Addresses 

Main memory addresses 0 through lOOOOO are 

16 

reserved for the operating system. Any reference to 
this address range by a Job mode program results in a 
mode illegal abort of the program in execution. 

Main memory addresses 0 through 4 OQO are reserved 

16 

for the storage of the monitor’s register file. Any 
reference to this area by a monitor mode memory 
access will cause a monitor mode illegal abort. 
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3»1*2.2 Instruction Addressing 

Instructions are addressed on full-word and half-word 
boundaries. The instruction address counter will, 
therefore* be incremented by a half-word after 
executing a 32-bit instruction and by a full word 
after executing a 64~bit instruction. This allows 
instructions to be packed contiguously in storage. 

The following chart illustrates the various ways 
instructions may be packed within 64-bit words. 


bit position 

0 31 32 63 


I 32-blf inst. ♦ t 64-bit inst. uoper 

j 1 

I 64-bit inst. lower I 64-bit inst. uoper 

J 1 

I 64-bit inst. lower I 32-bit inst. 

J ^ 

I 64-bit instruction' 

! i 

• 32-bit inst. ♦ I 32-bit inst. ♦ 


♦These could also be 32-bit packets of stream 
instructions. 


Note that a branch is possible to any of the 
instructions. The lower 5 bits in ♦ny branch address 
will always be interpreted as zeros. 


3»i.3- Termination Rules 

For- instructions which terminate upon exhausting the 
length of a data field* data string or vector* if 
that item is exhausted prior to the first operand 
fetch* the instruction becomes a no op* no data is 
fetched and no data flags are altered. 


3.1.3. 1 Stream Instruction Termination 

Stream instructions terminate when the result vector 
is exhausted. Source vectors which are exhausted 
before the result vector Is exhausted are extended* 
as reauired* with the operand designated in the 0 
field (extend code). 
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3, 1*3. 2 (N/A) 


3. 1.3.3 (N/A) 

3. 1.3. 4 (N/A) 

3.1.4 Definitions and Rules 


3. 1.4.1 Overlap of Operand and Result Fields 

If the result field overlaps a source field such that 
elements of the result are stored in the source field 
before elements in this portion of the source field 
are read, undefined results may occur . That is, the 
source elements may be the original elements or they 
may be the newly-stored elements. The instruction's 
results may become undefined. Note that some specific 
instructions prohibit any overlao of source and 
destination fields. This restriction is included in 
the appropriate instruction descriptions. 


8M3DtJ0iBiLiry of raB>. 
oeioiSAl pace is poor 
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3«i»4i.2 Self-Modifying Programs, Undefined Instructions and 

Undefined Operands 


3.1,4.2«1 Self-Modifying Programs EA2.03 

As a general rule, self-modifying programs are not 
allowed. .See Appendix A2.0 for further details. 


3*i*4»2.2 Illegal Instructions 

An instruction with an unused function code is 
termed an illegal instruction and causes the 
fol lowing: 

A. If In monitor mode, an automatic branch to the 
address specified by the contents of absolute 
register 4 is executed. 

B. If in job mode, an exchange to monitor mode is 
performed with execution beginning at the address 
specified by the contents of absolute register 3» 


3.i.4.2»3 Undefined Instructions 

The instructions with a defined F .code but which 
either- have undefined bits set or specify an 
undefined operation cause undefined results. 


3. 1.4. 2. A (N/A) 


3.1.4.2.5 No op Instructions 

The instructions that are defined as No op (no 
operation) instructions do not fetch data and do not 
alter data flags. 
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3«1.4.3 Floating-Point Format 


32~3it Floating-Point Format 

— bit o» exponent sign bit 
I 



{ I I 

i I I 

1 1 { 

* exponent binary point 

i 1 7 

I 

I 1 — bit 8, coefficient sign bit 




— 



1 


I 

9 

31 ' binary 





point 

1 


1 


1 

t 


I 


1 

1 

8-bit signed 

1 

24-bit signed 

1 

1 

1 

exponen t 

1 

1 

coe f f ic ient 

1 

1 

1 


NO- 10354636 
DATE Dec. 1377 
PAGE 31 
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0 7 8 31 

b i t \ / 

V 

32**bit floating-point number 

There are two 32~bit half-words in every 64-bit word. 
A 32-bit floating-point number occupies a half-word. 

A zero is a oositive sign bit and a one is a negative 
sign bit for both the exponent and the coefficient. 

Both the exponent and the coefficient are expressed 
as two’s complement signed integers. Numbers are of 
the form <c) 2 x where c is the 24-bit signed 
coefficient, x is the 8-bit signed exponent, and the 
base is 2- 


(continued) 
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3.1.4. 3.1 <Cont.) 

The range of useful coefficients is from 800000 to 

16 

7FFFFF . 

16 

23 

This represents numbers of the range -(2 ) through 

23 

+ iZ -1). 


The range of useful exponents is from -90 to 6F 

16 16 

which is from minus 112 to plus 111 • The 

10 10 

values of 70 through 8F all fall into a special 
16 16 

end case range as defined by the following table. 

X is any hexadecimal digit. 


Element 

Reor esentat ion 

Machine Zero 

8XXXXXXX 


16 

Indef i ni te 

7XXXXXXX 


16 


Examples of 32 -bi.t floating-point format represented 
in base 16. 


+ 1 

00 

OOOOOl 

+ 1 norma 1 ized 

EA 

400000 

-1 

00 

FFFFFF 

-1 normalized 

E9 

800 000 

+ 256 

00 

OOQiOQ 


10 


A floating-point number is normalized if the 
coefficient sign bit is different from the next bit. 
to the right. This condition implies that the 
coefficient has been shifted to the left as far as 
possible. Note that an all zero coefficient requires 
special attention for normalized operations. 
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3. 1*4. 3. 2 64-bit Floating-Point Format 


I 

I 

V 


I 

I 


bit 




1 


exponent 


I 

I 

I 


15 I 


1 

I 

i 

i 


I 


I 


sign bit 


* exponent binary point 


I-- bit 16, 

» 

I 


V 


I 

I 

I 


17 


coefficient sign bit 


I 

I 

I 

I 


-- * coef. 

63 I binary 
po i nt 

1 


bit 


I I 


1 16-bit signed 1 

48-bit signed 

1 

1 exponent I 

coef f ic i ent 

« 

( 

1 I 


1 

0 15 16 


63 



/ 


V 

64-bit floating-point number 


A 64-bit floating-point number is contained in a 
64-bit word. 


A zero is a positive sign bit and a one is a negative 
sign bit for both the exponent and the coefficient. 

Both the exponent and the coefficient are expressed 
as two*s complement signed integers. Numbers are of 
the form fc) 2x where c is the 48-bit signed 
coefficient, x is the 16-bit signed exponent, and the 
base is 2. 


(cont i nued) 
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3. 1»4.3. 2 (Cont. ) 

The range of useful coefficients is from 8000 0000 

0000 to 7FFF FFFF FFFF . This represents numbers 
16 47 16 47 

of the range -(2 > through- +(2 “1). 


The range of useful exponents is from 9000' to 

16 

&FFF which is from minus 28.672 to plus 28,671 . 

16 10 10 

The values of 7000 through 8FFF all fall into a 

16 16 

special end case range as defined by the following 
table. X is any hexadecimal digit. 


Element Representation 

Machine Zero 8XXXXXXXXXXXXXXX 

16 

Indefinite 7XXXXXXXX XXXXXXX 

16 

Examples of floating-point format represented in base 
16 


+1 

+1 normal ized 
-1 

-1 normal ized 
+256 

10 


0000 
FFD2 
0 000 
FFOl 
0000 


0000 

4000 

FFFF 

8000 

0000 


OOOl 

0000 

FFFF- 

0000 

0100 


0000 

0000 

FFFF 

0000 

0000 


A floating-point number is normalized if the 
coefficient sign bit is different from the next bit 
to the right. This condition imp I ies that the 
coefficient has been shifted to the left as far as 
possible. Note that an ail zero coefficient requires 
special attention for normalized operations. 
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3. 1.4. 4 


End Cases 

If indefinite is used as an operand in a floating- 
point instruction, both the upper and the lower 
results are indefinite. 


For the cases listed below, o represents machine zero 
and N represents an operand which is neither machine 
zero nor indefinite. 


0 ± 0 = 0 

0 + N = + N 

N + Q = N 


0 * 0=0 
0 * N = 0 
N * 0 =■ 0 


0 / 0 = Indefinite 

0 / N = 0 

N / 0 = Indefinite 
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3. 1.4. 5 Floating-Point Compare Rules 

Several of the instructions compare two floating- 
point operands for* 


3 • 

equa 1 ity 

(r> = 

(s) 

b. 

non-equal ity 

(r) <> 

(s) 

c. 

greater than or equal to 

(r) > 

(s) 

d. 

less than 

(r) < 

(s) 

For 

by 

these examples, the first 
(r) and the' second operand 

operand 
by (s). 

is represented 


3*1. 4.5.1 One or Both Operands Indefinite 

If one operand is indefinite, no compare condition is 
met since indefinite is not S greater than . I ess 
than t equal to . nor not equal to any other operand. 

If both operands are indefinite, the tr) = (s) and 
the (r) > fs) conditions are met since Indefinite is 
defined equal to indefinite. 


3. 1.4. 5. 2 Neither Operand Indefinite but One or Both Operands 
Machine Zero 

Any non-indefinite, non-machine zero operand with .a 
positive, non-zero, coefficient is strictly greater 
than machine zero. 

Any non-indefinite, non-machine zero operand with a 
negative coefficient is strictly less than machine 
zero . 

Machine zero is equal only to itself and any number 
having a finite exponent and an all zero coefficient. 


PAGE rs POOB 
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3. 1.4. 5*3 Neither Operand Indefinite Nor Machine Zero 

A. If the signs of the coefficients of the two 
operands are unlike, the operands are unequal 
and the operand with the positive coefficient is 
the larger of the two. 

3. If the signs of the two coefficients are alike, a 
floating-point subtract upper is performed? 
operand r minus operand s. 

Condition met criteria are analyzed as follows! 

a. If the upper 48 bits of the result 

coefficient are all zeros (r) = <s) 

b. If the upper 48 bits of the result 
coefficient are not all zeros <r) <> (s) 

c. If the result coefficient is positive 

(r) i (s) 

d. If the result coefficient is negative 

(r) < (s) 

The above criteria (a and b> for equality and 
non-equality do not guarantee that if r = s, that 
s = r when the following is true! 

a. The operands have unequal exponents. 

b. '*1*' bits exist in any of the right-most bit 
positions of the coefficient which will be 
shifted off the right during alignment of the 
smaller exponent. For example! 

0 16 63 


r = 100041 I 


s = — — 

100081 IXI 

REPEODUCIBILrrY OF THE 

original PAGE IS POOR Exponent difference = 4 

If X = 0 then r = s implies s = r 

If X <> 0 then if r = s, s <> r 
orifs=r, r<>s 


{continued) 
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3*1.»A»5*3 (Coni'*) 


The order of events of the floating-point subtract 
upper is first to comp 1 ement the subtrahend^ then 
align the coefficient associated with the smaller 
exponent and finally to perform a floating-point add 


operation. The 
s <> r , 

foil owing 

is an 

example 

II 

L, 

O 

s but 

Operand r = 
s = 

0100 

0104 

00 00 
00 0 0 

0000 

0000 

1001 

0100 


Complement s 
Align r 

0104 

0104 

■ FFFF 
.00 0.0. 

FFFF 

0000 

FFOO 
0100 _ 


Add al igned 
r and 

complemented s 

0104 

0000 

0 0 do 

0000 

1 

Since the upper 48 bits of the result coefficient ar 
ail zeros* the pair of operands are considered equal 
However* if the operands are interchanged* the 
following happens: 

Operand r = 
s = 

0104 

0100 

0000 

0000 

OOOO 

0000 

OlOO 

1001 


Complement s 
Align s 

OlflO 

0104 

FFFF 

FFFF 

. FFFF 
FFFF 

EFFF 

FEFF 

F 

Add r and 
comp 1 emente d* 

0104 

0104 

0000 

FFFF 

0000 

FFFF 

OlOO 
FEFF _ 

F 

a I igned s 

0104 

FFFF 

FFFF 

FFFF 

F 


Since the upper 48 bits of the result coefficient are 
not al 1 zeros* the pair of operands are considered 
unequa I • 
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3. 1.4.6 UDper and Lower Results 

The floating-point add» subtract and multiply 
instructions generate a result coefficient twice the 
length of the source operands* coefficients. The 
left and right halves of this result are called the 
upper result (U) and the lower result (L)» 
respec t i ve I y . 

The sign bit of the lower result's coefficient is not 
affected in a lower operation and remains at zero in 
two's complement arithmetic. The other bits of the 
lower coefficient receive no special treatment. 
Remember that a lower result is not meaningful alone* 
but it mus't be used in conjunction with its 
associated upper result. 

Sections 3. 1.4. 6.1 - 3. 1.4. 6. 4 are written for 64-bit 
operands. For 32~bit operands* substitute 47 for 95* 
46 for 94* 23 for 47* and 22 for where the latter 
numbers appear. 


3.1.4. 6.1 Right Normalization 

When the result coefficient overflows its register* 
a right shift of one place Is necessary. In this 
case* the entire 95“bit result is shifted right one 
place with sign extension and one is added to the 
exponent. This operation is known as 
right-norma I izatlon and it is done, when necessary, 
even if normalization is not explicitly specified by 
the instruction. This may cause exponent overflow? 
if so* the result is set to indefinite and data flag 
bit 42 may be set. 


3. 1.4.6. 2 Floating-Point Add 

Regardless of their signs, both operands' 
coefficients are extended to 94 bits In length, not 
including sign* by adding 47 zeros to the right of 
their binary points. 

The exponents of the two operands are compared and 
the 94“bit coefficient of the operand having the 
smaller exponent is effectively shifted right one bit 
and its exponent increased by one, successively until 


{continued) 
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3«1»4»6.2 (Cont.) 

the two exponents are equal. The sign of the shifted 
coefficient is extended from the left to the right 
during the shift. Negative coefficients approach a 
minus one and positive coefficients approach zero as 
they are shifted. 

The add is a 94~bit operation* not including sign. 
Right normalization takes place* if necessary. The 
coefficient for the U result is the left-most 47 bits 
and the coefficient for the L result is the 
right-most 47 bits of the 94-bit result. 

The exponent for the U result is equal to the larger 
of the two operand exponents. Right-normalization 
will increase this value by one* if it occurred. 

The exponent for the L result is 47 less than the 

10 

U result's exponent for all cases except threes 

a. Right-normalization causes the U exponent to 
overflow; the U result is set to indefinite* the 
L exponent will be &FD1 (59 in "the 32-bit 

16 16 

case ) . 

b. If the U result's exponent minus 47 causes 

10 

exponent underflow* machine zero is stored as 
the L result. 

c. If either or both operands were indefinite* the U 
and L results are indefinite. 
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3*i«4»6*3 Floating-Point Subtract tA±2»01 

The floating-point subtract operation is performed by 
como lamenting the coefficient of the subtrahend and 
performing a floating-point addition operation. The 
como I ementation is a 48-bit» two*s complement 
operation and is performed before the operands are 
extended to 94 bits. 

The hardware used for Floating Add or Subtract 
operations has an extra (or extended) coefficient 
sign bit. This means that the complementation 
of an 8000 coefficient is handled without the 
right shift of one and increase of the exponent 
by one as- used elsewhere. This, will cause a 
result (although not mathematically incorrect) 
which may differ from the result obtained when a 
right shift of one with increase of one is used» when 
the following conditions are met: 

1. The operand of the pair having the large 

exponent (OR either of the two operands if their 
exponents are equal) must have a coefficient of 
80 00 

2. This operation must require this same operand to 

be complemented due to 

a. being the subtrahend in a subtract operation 
OR 

b. sign control in either a subtract or an add 
operation — - 

3* The “other" operand must have a negative 
coef f icient. 


BEPRODUCIBILrrY OF THE 
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Exatnol e I A_r_B 

A 60 F F F 0 

B 64 8000 


RING 
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0 0 


NO. 

DATE 

PAGE 

REV. 
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Extra Sign Bit 
\ 
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(1) 

8 

0 

0 

0 

0 

0 

1 

f 

64 

8 

0 

0 

0 

0 

0 



i 

(0) 

8 

0 

0 

0 

0 

0 

1 

65 

4 

0 

0 

0 

0 

0 

Align 

operand 

1-6 0 

(1) 

F 

F 

F 

0 

0 

0 

1 

-6 0 

F 

F 

F 

0 

0 

0 

with 

sma 1 1 er 


t 

t 








1 








exponent 


->64 

(1) 

F 

F 

F 

F 

c 

0 

- 

>65 

F 

F 

F 

F 

8 

0 

Add A 

p 1 us 

A 

64 

(1) 

F 

F 

F 

F 

0 

0 

1 

65 

F 

F 

F 

F 

8 

0 

comp 1 ement 










\ 








of B 


+B 

64 

(0) 

8 

0 

0 

0 

0 

0 

1 

65 

4 

0 

0 

0 

0 

0 




6 

CO) 

7 

F 

F 

F 

0 

0 

1 

1 

f 

65 

3 

F 

F 

F 

8 

0 
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7 
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Instruction 

io n 


Example II. 


A ~ B 


A 
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If this operation is a Subtract Upper, the specified 
result is indefinite (with the appropriate data 
flags) while the CDC FMP result did not overflow. If 
this operation were a Subtract Normalized, note the 
foil ow ingt 


Resu It of 

Subtract 

Upper 


6F (Q) 7 F F F F 


Normalize the 6F 7 F F F F 

Upper Result 
shifting zeros 
in from the right 


-Instruction 
i Specification 
I 

FI 70 3FFFFF 
I 
I 
I 

F 1 6F 7 F F F F E 

I ! 


Note that the subtract operation is not always 
commutative. In other words it Is not always true 
that (A-S) = -(S-'A). This characteristic will be 
observed if the following is true of A and B: 

a. The exponents of A and 8 are not equal. 

b. "1" bits exist in any of the right most bit 
positions of the coefficient which will be shifted 

off the right during alignment of the smaller exponent. 


Example of (A-B) <> -{B-A)t 

A = 0104 6FCB 8Q7E 89F2 

B = OlOO 6FAC 3F5D A5FA < — 

I 

4 

I 

These two i bits wifi be shifted off during 
exponent alignment. 

{ cont i nued) 
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5* (Coot. ) 


Co mo S ement 

Bi 



-B = 

0100 

9053 

C0A2 

5A06 

Al ign 

B! 




-B = 

OlOA 

F905 

3C0 A 

25A0 6 

A-B: 





A = 

QlQA 

6FCB 

807t 

8 9F2 

-B = 

OiOA 

E2J1S 

3C0A 

?5AQ 6 


0104 

&8t)0 

BC8 8 

AF92 6 

A-Q = 

0104 

6800 

BC83 

AF92 

A) ign 

B*. 




B = 

0104 

D6FA 

C3F5 

0A5F a 

Comp 1 ement 

a: 



-A = 

0104 

9034 

7FB1 

760E 

-IB-Al : 




B = 

0104 

06FA 

C3F5 

OA5F A 

-A = 

0104 

503.4 .. 

■ 7-F ■ 

.76QE 


0lQ4 

972F 

4377 

5060 A 

~(3-A)= OlOA 68D0 

BC88 

AF9.3 


This differs from A-B in the last bit 
position. 


ME IS POOli 
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3.1.4.6.4 Results of the Floating-Point Multiply Instruction 

When two f I oat ing-point. numbers are multiplied, the 
lower result retains the 47 least significant oroduct 
bits generated. The sign bit of the lower result is 
always set to zero and the exponent of the lower 
result is the sum of the two source operands* 
exponents with the exceptions listed below: 

The upper result retains the 47 oroduct bits 
immediately to the left of the bits retained by the 
lower product. The sign of the upper product's 
coefficient follows the normal rules of algebra. The 
exponent of the upper result is the sum of the two 
source operands* exponents plus 47 with the 
following exceptions: 

a. The sum of the source ooerands* exponents (plus 
47 if upper result) exceed 6FFF for which 

10 16 
case the result exponent is set to indefinite. 

b. The sum of the source operands* exponents (plus 

47 , if upper result) is less than 9000 for 

10 16 
which case the result exponent is set to 
machine zero. 

Cm Either or both operands are indefinite for which 
case the result exponent is set to indefinite. 

d. Neither operand is indefinite but either or both 
operands are machine zero, for which case the 
result exponent is set to machine zero. 

If either ooerand has a coefficient of 8000 0000 0000 
and an exponent of X, the operand will be treated as 
though its coefficient were COOO 0000 0000 and its 
exponent were X+i. 

3»1*4»6.5 The Floating-Point Divide Instruction 

/ 

The quotient from the divide operation is the result 
of dividing the prenormalized, integer coefficient of 
the divisor into the integer coefficient of the 
dividend generat i ng a 47-bi t quotient (23-bit 
quotient for 32-bit divide). If either operand has a 


(cont inued) 
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3. 1.4. 6. 5 


(Cent. ) 


coefficient of 8000 0000 0000» the operand will be 
handled as though its coefficient were COOO 0000 0000 
and its exponent increased by one. When the divide 
hardware normalizes the divisor coefficient, the 
number of places shifted left is added to the 
exponent of the auotient as defined below. 


The exponent of the result will be given by the 
following equation! 

Exponent of Quotient = (Exponent of Dividend) 

- (Exponent. of Divisor) 

- (46 - NO 

10 

where NC is the number of places shifted left 
to prenormalize the divisor. For the 32~bit 
divide operation 22 is subtracted rather than 


The right-most bit of the quotient is neither rounded 
nor adjusted. The remainder is not retained. The 
sign of the quotient's coefficient follows the norma! 
rules of algebra. 


3. 1.4*6. 6 Normalized Upper Results 

The normalized add and subtract instructions generate 
an intermediate result identical to the final result 
of the Add U and the Subtract U instructions. 
Normalization of the intermediate, 48 -bit result then 
takes place as follows! 

The 48 -bit coefficient is shifted left one bit 
and its exponent is decreased by one, successively, 
until the sign bit’ and the bit immediately to the 
right of the sign bit are different. During this 
shift, zeros are attached to the right end of the 
48 -bit coefficient. If reducing the exponent by one 
causes exponent underflow, the result of the 
normalization operation is defined as machine zero. 
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3. 1.4.6. 7 (N/A) 


3. 1.4.7 (N/A) 


3. 1.4. 8 (N/A) 


3. 1.4.9 (N/A) 


3.1.4.10 (N/A) 


3*1.4.11 Operand Size Definitions 

The following definitions are imolied throughout the 

specification. 

Word -* A GA'^hit quantity, the address of 

the left-most bit always being a 
multiple of 64 

10 

Half-word'" - A 32-bit quantity, the address of 

the left-most bit always being a 
multiple of 32 

10 

Byte - An 8-bit quantity, the address of 

the left-most bit always being a 
multiple of 8 . 

10 

Digit - A 4“bit binary coded decimal number 

or sign. One digit per byte in zoned 
format and two digits per byte in 
packed BCD format. 


Sword - 512 bits (or 8 64 -bit words). 


REPRODUCIBILITS' of TSEI 
ORIGINAL PAGE IS POOR 
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3,1.5 Item Count {field lengths* offsets, indices, etc.) 

AM field lengths, offsets, indices, shift counts, 
etc., are item counts which soecify. a number of 
bits, digits, bytes, half-words or words. 

Where an item count other tha n an i nde x is 
contained in a 48-bit field, there shall be at least 
32 consecutive and identical sign bits. Sign bits 
must always be extended to the left to fill the 
16“bit or A-S-bit field containing it. 

The item count unit is specified by the instruction 
title line code (see arrow) . 

Exa V 

3.2.1.67 42 4 32 RG ADD N? <R)+(S) TO (T) 

The 32 indicates that field lengths and indices are 
expressed in 32-bit half-words. Any deviation from 
this method of specifying the units for the various 
item counts would be indicated in the instruction 
description or in the description of the instruction 
type. The instruction type refers to RG (register), 

SM (stream) , etc. 

An index may be either positive or negative in sign. 
The maximum magnitude of an index is a function of 
its usage. The Index is shifted to the left end-off 
zero/three/f i ve/six places before the addition to the 
base address when the unit for the index is 
b i ts/byt es/ha I f-Kords/words • Digits are not used as a 
unit for indices, 

A f-iet d I enqth must be positive in sign and have a 

16 

magnitude of less than 2 ? the use of a negative 

field length causes that length to become strictly 
undefined. Offsets are subtracted from the field 
length in stream instructions, but note that for a 
negative offset, this amounts to increasing the 
length specification since subtracting a negative 
quantity is addition. 
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3. 1.6 Data Flag Branch Register 


EA7.0 3 


3. 1.6.1 General Description 

The data flag register is designed to give the 
programmer an automatic branch to a special 
routine for certain operands^ results, conditions, 
etc., without his having to oay the time penalty of 
explicitly checking these conditions in his program. 
If a condition which has been previously selected to 
cause an automatic branch occurs during an 
instruction, the instruction is completed, the 
address of the next instruction which would have been 
executed is stored into the address portion of 
register 0l and a' branch is made to the address 
contained in register 02. The state of the data flags 
in the invisible package is defined only if the 
proaram was interrupted between instructions. 


3.1*6. 2 Register Description 


PRODUCT MASK 'DATA 


FIE 

LD 


FIELD 

FLAGS FREE 

FLAGS 


I 


1 

1 


1 


1 


i 16 b 

its 

« 

t 

16 bits 1 

16 bit 

s 1 16 

bi 1 

■s 1 


1 ^ 13 


15 1 

♦ 119 311 

♦ 135 

471 ♦ 1 

51 

581 ♦ 1 


0 2 


16 

18 32 

34 

48 50 


59 63 


♦Bits 

0 throu 

gh 2, 16 through 

18, 32 t 

hrough 34, 

48 

throu 

gh 

50, 

and 59 through 53 

of the 

data flag 


reg is 

ter 

are 

undefined. 

Any 

attempt 

to 

samp ! e , 

set 


or clear these bits is meaningless and the result 
of any instruction trying to do so is undefined. 

An additional register providing bits 54 through 127 
has been added for the expanded vector capabilities. 
Bit assignments, and location in the invisible 
package, have not been made as yet. 
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3*1*6.2»1 Data Flag Bits 

Data flags 35-47 indicate conditions that have 
occurred. Bits 35-47 are c I eared only by the Data 
Flag Register Bit Branch and Alter, and the Data Flag 
Register Load/Store instructions. 


3. 1.6. 2. 2 Mask Bits 

A mask bit is associated with each of the data flags. 
The mask bits have the function of selecting the 
conditions for which the programmer wishes an 
automatic data flag branch. 

It is important to note that the associated mask bit 
need NOT be set in order to set a data flag bit. The 
mask function is solely one of enabling a particular 
data flag to cause a bit to set in the product field.. 

The order in which the mask bit and its associated 

data flag bit are set is immaterial, as the result is 

the same; that is, their associated product bit is 

set . 
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3.1.6«2»3 Product Bits 

Each product bit is the dynamic logical product of a 
data flag bit and its associated mask bit. Data flag 
branches are performed when there is at least one one 
in the product register and the data f lag branch 
enable bit is set. 


3. 1.6.2. Data Flag Branch Enable Bit 

The data flag branch enable bit, bit 52, must be set 
for an automatic data flag branch (OFB) to occur. 

Bit 52 is automatically cleared by the hardware when 
a OFB takes place. It must be reset with a Data Flag 
Register Bit Branch and Alter or a Data Flag Register 
Load/Store Instruction to re-enable the DF8. 


3. 1.6. 2. 5 Data Flag Register Bit Assignments 

Product Bit 
I 

I Mask Bit 
I I 

I 1 -Data Bit 

I ! I 

V V V 

3- 19-35 

Soft Interrupt. Monitor software can set bit 35 o,f 
a Job’s Data Flag Branch register while the register 
is stored in the' Job’s invisible package. If, after 
exchanging back to Job mode, bit 35 and its 
corresponding mask bit (bit l9) are set, a normal 
data flag branch occurs following completion of the 
current instruction. 

4- 2Q-36 

Job Interval Timer 

5- 21-37 
N/A 


( continued) 
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3. 1.6.2. 5 (Cont.) 

6-22-38 

N/A 


7-23-39 


47 


The binary result exceeds the range of + (2 -l).lO. 


8-24-40 

Bit 40 is the. inc I usl ve OR of bits 37? 38 and 39. 
Bit 24 masks bit 40. Bit 8 is the logical product 
of bits 24 and 40. 



( cont i nued) 
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3. 1.6. 2. 5 (Cont.) 

9- 25-41 

Floating-point divide fault: The divisor has an all 

zero coefficient or the divisor as read from the 
register file or from central storage is machine 
zero. If the divisor and/or the dividend is 
indefinite* no divide fault exists. If a divisor 
causes a divide fault* the quotient is set to 
indefinite. The exponent overflow and result machine 
zero data faults are not set by a divide whose 
divisor caused a divide fault. 

10- 26-42 

■ Exponent overflow! The exponent of the result is 
larger than 5FFF (6F for 32-bit arithmetic). 

16 16 

Results are not checked for exponent overflow until 
after the exponent adjustment for normalization or 
significance has taken place. In the adjust exoonent 
instructions* if a left shift exceeds the number of 
places required for normalization* this data flag is 
set. Exponent overflow causes the result to be set to 
indefinite; therefore* the indefinite flag will 
always be set on an exoonent overflow. This exoonent 
overflow data flag is^not set if either source 
operand from central storage or the register file is, 
indefinite or by a divide instruction whose divisor 
causes a divide fault. 


11-27-43 


Result Machine Zero: The exponent of the result 

returned to Main Memory or to the Register File 
is less than 9000 (90 for 32-bit arithmetic). 


16 

Result Machine Zero 
underflow or by one 
being machine zero, 
bit is not set by a 
divide fault. 


16 

may be caused by exponent 
or more of the input operands 
The Result Machine Zero data flag 
divide whose divisor causes a 


(cont inued) 
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12-28-44 


Bit 44 is the incJusive OR of bits 41» 42 and 43* 
Bit 28 masks bit 44. Bit 12 is the logical product 
of bits 28 and 44. 


13-29-45 


A negative source operand was encountered in a square 
root instruction. The square root of the absolute 
value of the operand is formed; and the two’s 
comolement of this square root is stored as the 
resu 1 1 . 

14- 30-46 

An indefinite result was placed into central storage 
or into the Register File.... or .... either or both 
operands of a floating-point compare were indefinite. 

An indefinite result may be caused by one or both 
operands of a floating-point arithmetic operation 
being indefinite or by the occurrence of either a 
divide fault or an exponent overflow. 

15- 31-47 

Breakpoint; See section 3*2. 1.5. 


3. 1.6. 2. 6 Free Data Flags 

Bit 51 is the dynamic inclusive OR of the product 
field. This bit is set if any of bits 4 
through 15 are set. Sit 51 cannot be cleared 
directly; bits 4 through 15 must be cleared to 
accomolish this* 

Bit 52 is the data flag branch enable bit. If bit 52 
is a one and bit 51 becomes a one (or vice 
versa) a data flag branch occurs at the end of 
the current instruction. See 3. 1.6. 3 for 
additional information. Bit 52 is 
automatically cleared by the execution of a 
data flag branch. 


(continued) 
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3»i»&»E*6 CCont*) 

Bits 53» '54 and 55 

There are no product or mask bits associated 
with bits 53» 54» and 55. Bits 53, 54, and 55 
are cleared out automatically during the 
initial phases of the instructions (unless the 
instruction is a no op — see Section 3.1.3) 
which may set any of them. Thus, if pertinent, 
these bits must be sampled before executing 
another instruction which would clear their 
previous state. The setting of bits 53, 54, 
and 55 does not cause a data flag branch. 

Bit 56 A CPU gate associated with the Maintenance 
Station monitoring counters (See Functional 
Computer Specification listed in Section 2.0). 

Bit 57 - A CPU gate associated with the Maintenance 
Station monitoring counters (See Functional 
Computer Specification listed in Section 2,0), 

Bit 58 - N/A 


NO. 10354636 
DATE Dec., 1977 
PAGE 55 
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3.i.6*2«6 (Cont.) 

OP OP 

CODE 53 CODE ’ 53 

I CATA FLAG BITS 54 t DATA FLAG BITS 54 

V 37 38 39 41 42 43 45 46 47 55 V37 38 39 41 42 43 45 46 47 55 
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3. 1.6. 3 Data Flag Branch (DFB) CA7.03 

If a bit in the mask field is set and its associated 
masked data flag bit is set » the associated bit in 
the product field becomes a one. Bit 51 in the free 
flag field also becomes a one since it is the dynamic 
inclusive OR of bits 4 through 15 of the product 
field. 

If bit 51 is a one from above and if bit 52 is also 
set {this is the DFB enable bit), an automatic DFB 
occurs. The DFB takes place sometime following the 
termination of the instruction which caused the DFB 
condition to exist. The execution of the DFB sets 
the bit address of the next instruction into the 
right-most 48 bits of register 01 and a branch is 
made to the bit address contained in the right-most 
48 bits of register 92» The DFB enable bit in the 
flag mask register (bit 52) is automatically cleared 
at this time. The left-most 16 bits of register 0i 
are cl eared to zero by a 0F8. 

Programmer Notes 

DFB*s are disabled when bit 52 is cleared. But if 
bit 52 Is reset before eliminating all the DFB 
conditions, another DFB will occur which will change 
the return address in register 01 and the machine may 
wind up in a “tight loop** if proper caution is not 
taken. Sampling bit 51 for a zero before setting bit 
52 will prevent this situation for all cases except 
those involving the Job interval timer. When using 
the Job interval timer, it should be remembered that 
the setting of bit 36 in the DFR occurs 
asynchronously with respect to instruction execution 
once the Job interval timer is loaded. Thus the time 
may set bit 36 after the check of bit 51 and before 
the branch to the contents of register 01. One method 
of handling this situation is to examine the contents 
of register 0l upon entering the routine for handling 
data flag branches. If register 01 indicates that the 
branch occurred outside the DFB routine, then 
register 01 could be copied to a temporary location. 
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3.1* 6*3 (Cont.) 

If register 01 indicated that the branch had occurred 
within the OFB routine, then register 01 would not be 
copied to the temporary location. At the conclusion 
of the OFB routine, a branch would always be taken to 
the contents of the temporary location, 

A simpler method is to combine the setting of bit 52 
and the branch to the contents of register 0i into a 
single 33 instruction (33603401). 


3.1.7 Register File ^ 

For register operations, the 8-bit instruction 
designators directly address the 256 registers of 

10 

the Register File, During program execution (monitor 
or } ob ) , these registers reside in the Register File. 
When an exchange operation occurs, the registers are 
.stored into 256 memory locations beginning at bit 

10 

address zero if in monitor mode and bit address 
4000 if in Job mode. The registers may not be 
16 

referenced as memory by their associated monitor or 
J,ob program. The only exceptions to this rule are 
the B7 and BA instructions with G-bit 7 set, (The B7 
and 3A instructions are illegal in the COC FMP). 
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Figure 1 shows a map of the Register File and the 
relationship between the register» its storage 
address for monitor mode and its 8 -blt designator. 

The number on the right represents the bit address 
and the number on the left is the value of the 6 -bit 
designator for the 64-bit register case. The number 
inside the register represents the value of, the 8 -bit 
designator for the 32-bit operand case. Note that any 
reference to 32 -bit register one is undefined'. 


, 8 -bit Designator Monitor Mode 

Bit Address 

Bit 

0 31 32 63 


0 I ///////////////I ///////////////I 0. . . 00 00 

i , , 15 

II 2 I 3 10 .. .0040 

j I 15 

2i 4 1 5 1 0. ..0080 

J J 15 

\ / 

/ \ 

} { 

7FI FEl5 I FF16 10...1FC0 

} 1 15 

801 JO. . .2000 

. J 15 

\ / 

/ \ 

J I 

PF I 1 

161 I 0. . . 3FCQ 

Ig 


Figure 1 . Register Fife 
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3.1.7 


(Cent. ) 


Rea I St er 


A. Register Zero (Job or Monitor Mode) 

1. During an exchange operation the contents 
of the trace register and the appropriate 
memory location for register zero are 
exchanged (swapped) . 

Monitor to Job: 


{Before I After I 
{Exchanges Exchange! 




\ -- 


-- *{ 

lAbsolute Address Zero S 

A 

! 

C 

1 

1 

ITrace Register 1 

C 

1 

A 

t 

{ 


Job to Monitor: 


{Before S After I 
'Exchange! Exchange* 


lAbsolute Address Zero 

I 

A 

— 

A 

— ! 
1 

1 

ITrace Register 

I 

C 

A 

^ t 
\ 
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During a 7D (Swap) inst-ruction involving regi ster 
zero as part of the register field, note a 
required peculiarity. Although the current 
contents of the trace register are sent to 
the appropriate memory location for register 
zero, the current contents of the trace register 
are not altered. 



1 Contents 

{ Contents 

1 


IBefore 70 

{After 7D 

1 


1 

1 

1 

_ t 

IMemory location for 

1 

1 

1 

Iregister zero 

I ■ A 

1 B 

1 

. f 

ITrace regis.ter 

1 B 

I 6 

1 


2. Register zero when referenced by , a designator 
will provide machine zero as an operand 
except when used as a source register for 
a base address or other description for a 
stream instruction, in which case register 
zero will appear to contain 64 -zero bits. The 
use of a zero address may cause the 
instruction to be treated as an. illegal 
instruction as defined in Section 3.1.10. The 
use of a zero field length may cause the 
instruction to become undefined such as the 
3B instruction. If register zero is specified 
as the destination register, the instruction 
typically performs normally with data flags 
being set. if warranted, but no data is 
stored. Some instructions become undefined if 
register zero is specified as a destination 
register . 


The following tables are intended to define what 
operand is obtained when register zero is 
specified for a source operand. To simplify 
this chart, specifying of register zero as a 
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destination register has been ignored. A blank 
in the chart indicates where it is either not 
possible to specify register zero or it may only 
be specified as a destination register. The 
designators R» S» T, G» X, A» Y» Z and C are 
used for convenience although they do not apply 
to all instructions. Utilization of the following 
symbols is made. 


Result When Register Zero is Referenced 
Sv mbo I for an O oeran d 

M Machine zero is provided. 

800Q OOQO 0000 0000 64-blt mode 

16 

8000 0000 32 -bit mode 

16 

A All zero is provided. 

Z AM zero in the used portion. 

In this instance the left-most bit 
is not used thus machine zero and 
all zeros are indistinguishable. 

N Instruction performs as a no op. 

C No control vector is used. 

0 A mask of all ones is provided. 
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Instruction Instruction 

Op Designator Op Designator 


Code 

R' 


s 


T 


Code 

R 


s 

T 

i i 


J 


1 




1 


t 


1 



\ 


1 


1 


1 


1 


1 



1 


1 


{ 




\ 


1 



1 


1 


J 

» 


J 


1 


f 

04 1 

Z 

1 




1 

I 


1 


1 


J 



1 


1 


i 


J 


I 


^ » 



1 


J 


1 


1 


1 


J 



1 


i 


1 


1 


1 



09 1 



z 

1 

1 

Z 1 

J 


J 


1 

I 


1 

OA \ 

z 

J 


J 


1 


1 

1 


1 


1 



1 


) 


s 

23 

J 

M 

1 

z 

1 



1 


1 


1 

2C 

\ 

M 

J 

M 

J 



} 


1 


\ 

20 

{ 

M 

1 

H 

! 

OE ! 

z 

f 

z 

1 


1 

2E 

1 

M 

? 

M 

J 



1 




I 

2F 

1 


1 

Z 

Z ) 













, 

10 i 

M 

1 

I 


1 


1 

30 

t 

1 

H 

1 


J 

11 t 

z 

\ 


1 


1 

31 

1 

Z 

1 

2 

Z 1 

12 1 

z 

1 

z 

5 


\ 

32 

1 


1 

z 

Z ! 

13 \ 

z 

\ 

z 

! 

Z 1 

J 

33 

1 


1 


Z 1 



1 


1 


1 

34 

1 

M 

t 

z 

J 



1 

i 


] 


1 

35 

1 

t 

Z 


2 

Z 1 



1 


1 


{ 

36 

1 


f 

Z 

Z J 



} 


« 


J 

37 

1 

f 


1 


1 



1 

1 


J 


J 

38 

\ 

M 

1 


1 



t 


! 


J 


1 


J 





< 




1 

3A 

1 

1 

Z 

\ 


1 



i 


H 


J 

3B 

1 

z 

J 


} 



] 


1 


1 

3C 

I 

z 

I 

2 

1 



1 

1 


\ 


J 

3D 

i 

z 

1 

2 

1 



< 

« 


1 


1 


1 


J 


1 



1 


1 


■ 1 

3F 

1 

z 

J 
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Instruction- Instruction 


Op 


Designators 


Op 


Des i gnators 

Coda 

R 


S 

T 


Code 

R 


S 

T 

J40 1 

N 

1 

N 


I- 

l&O 1 

M 

1 

M 

1 

1 A1 1 

M 

1 

M 


1 

161 1 

M 

1 

H 

! 

lAE i 

M 

1 

M 


1 

162 1 

M 
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M 
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1 1 


1 
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1 
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N 
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1 
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M 

M 



1 64 5 

H 

1 

M 

1 
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1 

M 


1 
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1 
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1 
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1 
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H 
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1 i 





! 

167 1 

M 
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( 
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M 


Z 
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3.1.7 (Cont. ) 


81 64 -b i t ' registers one and two ( 32 “b it registers 2 

through 5) 


If data flag branches are being used, 64~bit 
registers one ana two must be reserved 
exclusively for that use. Register one is the 
data flag branch exit address and register 
two holds the data flag branch entry address. 


C. 


Monitor's 64-bit registers 0-F (32“bit registers 

16 

0-lF ) 

16 


Registers zero, one and two have the restrictions 
listed in A and B above. Registers 3 through 7 
are used for the illegal instruction, exit force, 
and external interrupt entry points, 

0. 32-bit register one (right-most half of 64-bi’t 
register 0) 


Any reference to 32-bit register one is undefined. 


EUPRODticiBiLrrr ob* the 

ORIGINAL PAGE IS POOR 
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3»1«8 Real-Time Counters 

3*i«8«l Free Runninq Clock 

This clock consists of a free-running 47 -bit counter 
and a positive sign bit for a total of 4.3 bits. It 
can be stored into register T using a "Transmit Real- 
Time Clock to T" (39) instruction. This counter 
increments at a one MHz rate. 

3.1.8. 2 Monitor Interval Timer 

The monitor interval timer is a 24 -bit timer that 
decrements at a one MHz rate. 

This timer can be loaded from register R using the 
"Transmit (R) to Monitor Interval Timer" (oA) 
instruction, when the computer is in monitor mode. 

The timer can be activated by loading it with 
anything but all zeros. Once it is activated, 
it wil I decrement until it reaches zero or is 
deactivated. When the timer is decremented to zero, 
it wil I cause an external interrupt on channel 16 
which must be processed like any other external 
interrupt. 

The timer is deactivated by the following methods! 


1. Master clear 

2. Loading with all zeros 

3. Decremented to all zeros (when it is decremented 
to all zeros and caused an external interruot, 
it will be inactive until loaaed with some value 
other than zero) , 


0? 
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3 . 1 . 8 . 3 Job Interval Timer 

The job interval timer is a 24 -bit counter 
decrementing at a -one MHz r'ate. 

This clock can be loaded (in job mode) only from 
register R using a 3 A (Transmit R to Job Interval 
Timer) instruction. Once loaded^ the timer continues 
to decrement until either an exchange to monitor 
mode occurs, the timer decrements to zero, or the 
timer is loaded with a value of zero. If an exchange 
■to monitor mode occurs, the decrementing of the Job 
interval timer is stopoed and the current contents 
of the timer are stored in the invisible package. 
When the execution of that Job is resumed, the Job 
interval timer is loaded from the invisible package 
and resumes decrementing. 

When the timer decrements to zero, bit 36 of the 
data flag branch register will be set. Thus, if 
the corresponding mask bit is set, a data flag 
branch would then occur during the next RNI. 

The timer may be deactivated by loading it with a 
value of zero. This does not cause bit 36 of the 
data flag branch register to be set. Master clear 
will also deactivate the Job interval tim er. 


The timer is deactivated by the following methods: 

1. Master clear 

2. Loading with a value of zero 

3. Decrementing to zero 

The contents of the job interval timer may be 
sampled by use of the 37 instruction (Transmit 
Job Interval Timer to T) . This does not 
deactivate the counter. 


3.1.9 N/A 


EBPEODUCIBILBnr OF THE 
OEIGMAL PAGE IS POOP 
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3»i*l0 Exchange Operations and_ Invis ib ! e Package CA6.0] 

The purpose of the exchange is to change the prime 
role of the CPU from monitor mode to job mode or 
from Job mode to monitor mode. 

The exchange operation from monitor to a Job is 
always accomplished with an exit force instruction. 
This causes the contents of the invisible package to 
be loaded into the appropriate registers? the mode to 
be changed from monitor to Job enabling interrupts? 
and execution to begin as specified by the invisible 
package. Note that this may be the restarting of a 
previously interrupted program. 

The Exit Force instruction and the channel interrupt 
are the two normal ways of getting from a Job in Job 
mode to the monitor program in monitor mode. 
Attempting to execute a monitor-type instruction in 
job mode or by attempting to execute an undefined 
op-code comprise the third way into- the monitor. 
Except for the starting point in the monitor program, 
the operation performed in getting to the monitor are 
identical for the three. Sufficient information to 
restart this Job is stored into the invisible package 
and the mode is changed from Job to monitor. The 
monitor program is executed starting at the absolute 
address contained in the right-most 48 bits of the 
monitor's register 3 ■* 5» or 6. 


( cont inued) 
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3.1.10 (Cont. ) 


Monitor register^ the 

Method of getting contents of- which is 

to the Monitor . used to set P 


1 . Attempt to perform an 

il legal instruction or a 
monitor-type instruction 
in ] ob mode 

2. Attempt to perform an 
il legal Instruction in 
monitor mode 

3« Exit force 

External interrupt 


Register 3 

Register 4 

Register 5 
Register 6 


The right-most ten -bits of the absolute starting 
address of the invisible package must be zeros. 

The monitor must set uo an invisible package for 
each Job. There is NO invisible oackage for the 
monitor program itself* 

To start a fob initially, the monitor must clear 
the entire invisible package area- except for the 
program address areas. 

For a more detailed description of the exchange 
operation, see the applicable computer specification 
as listed in Section 2*0. 


(cent inued) 
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PACKAGE 

Abso 1 ut e 
Word Address 

1 ///////////I 
( 1 _ 

Program Address 

IXXX XO 

} 1 
1 f //////// //\ 

Breakpoint 

IXXX XI 

1 \ 

\ 5 IXXX X2 

I ( 1 

I///////////! 

1 

//////////////////////////////////IXXX X3 


Data Flag Register 


} I I 

I ///////////I //////////////////////////// //////I XXX 


IXXX X4 

-X5 
-X6 

! /////////// I //////////////////////////////////IXXX- — X7 


i I , 

I /////////// I //////////////////////////////////IXXX 

I j I 


lASCII Mode Sit (Clear bit for ASCII Mode - 
V Set bit for EBCDIC Mode) 

/////////I l////////////////IJob Interval TimerIXXX X8 

1 

///////////////////////////////////////////////IXXX X9 

t 

Current Instruction IXXX-— XA 

1 

I /////////////////////////////////////////////// I XXX XB 


/////////////////////////////////////////////// I XXX- 

I 

///////////////////////////////////////////////IXXX- 

1 

I XXX- 

j 

I /////////////////////////////////////////////// 1 XXX- 


— XC 
— XD 
— XE 
— XF 


The comouter returns the information in the 
non-crossh atched areas except as noted in Appendix As.O. 
For specific detail in the cross-hatched areas see the 
applicable machine specification as listed in Section 2.0. 



CONTROL DATA I 

I 

Corporation I 


ENGINEERING 


NO. 10354636 
DATE Dec. 1977 

SPECIFICATION PAGE 76 

REV. A 

R A D L 

3«2 Performance Characteristics 

3.2.1 Instruction Descriptions 

The instruction titles (3.2.1. 1 ** 3.2. 1.256) are 

written in the following format! 

3.-2.1.XXX AA B CC DD NAME OF INSTRUCTION [AX] 

where AA = the function code (00~FF ) 

16 

B = the format types, i-E 
CC = the number of bits in the operand 

1 single bit 
32 ha I f “words 
64 words 

E either 32 or 64 -bit 

8 both 32 and 64 -bit 
NA operand size not applicable 

DO = the instruction type 

Blank Undefined 
8R Branch 
IN Index 
MN Monitor 
NT Non-Typical 
RG Register 
SM Stream 

CAX] =: The section in the Appendix which gives 
further information. 
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3. 2.1.1 QQ 4 NA MN IDLE 

When in monitor mode* snabJe the external interrupts 
and idle until an external interruot occurs. The R. 
S and T designators are undefined and must be set to 
zero. r 

3. 2. 1.2 01 4 64 NT TRANSMIT (R) TO BACKING STORE MAP 

REGISTER AND CURRENT BACKING STORE 
MAP REGISTER TO (T); SET AND CLEAR 
BUSY FLAGS PER (S) 

The Backing Store contains 8192 biockSt each of 
32.768 64-bit words. All or any portion of this 
Backing Store can be assigned to the user presently 
residing in the CPU by setting the Block Base 
Address (BBA) and Block Field Length (BED in the 
backing store map register. When in Job mode, ail 
backing store addresses sent to the Sw ao Unit have 
the BBA added to their values to form an absolute 
backing store address. All monitor mode and I/O 
references are made as absolute references without 
the addition of the BBA. 


The BBA is contained in bits 48 through 63 of 
register R, while -the 8FL is contained in bits 32 
through 47 of register R. Register -T at the 
comoletion of this insruction contains the current 
values of BBA 'and BFL transmitted from the backing 
store mao register in bits 32 through 53, while the 
uDoer bits (0 through 31) contain the block number 
and number of contiguous blocks that have been set 
busy in the Backing Store {as a result of an I/O 
operation, SWAP operation or monitor mode force busy 
operation). Bits 0 through 15 contain the number of 
contiguous blocks while bits 16 through 31 contain 
the block number of the first block found busy in the 
Backing Store, beginning at the block number found in 
bits 16 through 31 of register R. 

If the contents of bits 0 through 15 or bits 32 
through 47 of register S are non-zero the 
instruction also force-sets or force-dears groups 
of block busy flags in the Backing Store as follows! 

o Bits 0-15 = number of blocks to be forced busy in 
the Backing Store 

0 Bits- 16-31 = block number of first block of group 
to be set busy 


{continued) 
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3.2.1.E (Cont.) 

o Bits 32-47 = number of blocks to be forced not 
busy in the Becking Store 

o Bits 48-63 = block number of first block of group 
to be forced not busy 

If a force busy and force not busy is attempted on 
the same block, or blocks, the result Is undefined. 

Note that the busy flags for each block can be set 
or cleared by the I/O channel and monitor by 
software, and by the Swap Unit during BSWAP 
transfers. If a BSWAP operation from lob mode 
specifies a busy block, the BSWAP operation 
terminates, setting data flag 95. 

The full execution of this instruction as described 
is possible only when it is executed in monitor mode. 
If the instruction is executed in job mode, only 
the block busy information is transferred to bits 0 
through 3i of register T with bits 16 through 3i of 
register R specifying the beginning, block number in 
finding the first busy block. 


3. 2. 1.3 02 4 64 MN TRANSMIT (R) TO CHANNEL (S) AND 

CHANNEL (S) TO (T) 

Register S contains the number of an I/O channel. 

The contents of 64“bit register R are transmitted' to 
the specified I/O channel (between 0 and 15> , at the 
same time the specified I/O channel transmits a 
64-bit Quantity to be stored in register T. The 
data being exchanged consists of control information 
passed between the monitor mode program and the I/O 
channel intelligent processor (PDC). The meaning of 
any combination of bits in these exchanged control 
words is solely defined by the software protocols 
estabi ished for the monitor and the PDC. 


3.2.1. 4 03 ILLEGAL 


3. 2. 1.5 04 4 64 NT BREAKPOINT-MAINTENANCE 

The breakpoint instruction transfers R to the 
breakpoint register. The breakpoint register is 
used as a maintenance and program debugging aid. 


(cont inued) 
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3. 2. 1.5 (Cont.) 

Note: Breakpoint will not be sensed on any scalar 

memory references in the CDC FMP. 


I I Usage t I I 

1- I Bits I Breakpoint Address I I 


0 8 9 15 16 58 59 63 

Bits 0-8 and 59-63 are not used. 

The breakpoint address is compared with various 
addresses such as the current instruction address^ 
READ 1 and READ 2 operand addresses, etc. If the 
breakpoint address matches one of these addresses 
and the proper usage bit is set, bit 47 of the data 
flao branch register is set indicating a breakpoint. 
Any combination of usage oit is permissible; 
therefore the breakpoint address can b'e checked 
against any or all of the addresses listed below. 

The breakpoint register is part of the invisible 
package of a Job. 

Breakpoint Usage Bits 

Bits 9-15 are breakpoint usage bits where if: 

a. Bit 9 is set, breakpoint on half-word contents 
of the program address register <P) Just after 
the execution of the Instruction at that 

I o ca t ion. 

b. Bit 10 is set, breakpoint on the READ l operand 
address for stream, or the read operand on 
random addressing instructions. 

c. Sit 11 is set, breakpoint on the READ 2 operand 
address for a stream instruction. 

d. Bit 12 is set, breakpoint on the WRITE l address 
for a stream instruction or the write operand on 
a random addressing instruction. 

e. Bit 13 is set, breakpoint on the READ 3 control 
vector or operand address (mask) f or a stream 
in struct ion . 

f. Bit 14 is set, breakpoint on the READ i order 
vector address. 


(cont inued ) 
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3 • 2 • 1 . 5 


3 . 2 • 1 . 6 


(Cont.) 

g. Bit 15 is set, breakpoint on the READ 2 order 
vector address. 

Breakpoint Compares 

1 « When in Job mode, addresses are compared with 
breakpoint . 

2* When in monitor mode, absolute addresses are 
compared with breakpoint. Since the monitor 
program does not have an invisible package, the 
breakpoint register must be set up each time the 
monitor program is entered. The breakpoint 
register is automatical ly cleared to zero during 
the exchange to the monitor. 

3 . Program address compares are made on half-word 
boundaries, and all other compares are made on 
sword boundaries. 

Data f I ag: bit 47 

.05 ILLEGAL 
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3. 2.1. 7 06 7 NA MN FAULT TEST - MAINTENANCE [All. 01 

This instruction is used to comotenient checkword 


bits on the scalar write bus in order that the read 
SECOED circuitry may be checked out. It can also 
be used to disable the error correction cir.cuitry on 
all read buses. This allows data to be passed 
through the SECOED hardware without any correction 
taking place. 

This instruction is always enabled during monitor 
mode. In ]ob mode it becomes a no op unless bit 13 
of word 8 in the Job’s invisible package is set. 

The modes are set up by executing this instruction 
with a " 1 ” in the appropriate R designator bit and 
are cleared by executing the instruction with a 
"O” in the same bit location. 

The R designator bits are defined below: 

R DESIGNATOR BI T 

8 Disable error correction on a I I 

Read buses. 

9-15 Checkword bits to be 

comp lamented. 

Programmer Note: These bits must be set to zero 

before any monitor to ] ob exchange operation. If 
these bits are not set to zero via an 06 instruction* 
the connection network could produce invalid data on 
the read and invalid data could be written into 
memory . 

The S and T designators are undefined. 

A description of each of these faults can be found 
in specification 10354637*- COC FMP Functional 
Computer Specification. 


SECOED FAULTS 

The test is initiated by executing an 06 instruction 
with bits 9 through 15 selected of the P designator 
to complement the respective checkword bits of 
haj f-w-ords 0 *' 1 * 2 * and 3 on the write scalar bus to 
Main Memory. By appropriate selection of data 
Pits and complementation of checkword bits when 


(continued ) 
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3. 2. 1.7 (Cont.) 

writing in memory, one shou i d be able to generate 
SECDED faults on all read buses. This should allow 
complete checking of the read SECOED hardware and 
also the fault recording hardware for type and 
address of the fault. 

The forced complementing of the checkword bits is 
discontinued by executing an 06 instruction with 
bits 9 through 15 of the COC FMP. 

This description explains the way the 06 instruction 
is executed on the CDC FMP. 


3. 2. 1.8 07 ILLEGAL 

3. 2. 1.9 08 4 NA MN INPUT/OUTPUT PER R 

When in monitor mode? Activate the channel flag' 
designated by the R designator and exit to the next 
sequential instruction. If the R designator 
specifies a non-existent channel, the operation 
of this instruction is undefined.. 

The S and T designators are^ undefined and must be 
set to zero. 

3.2.1.10 09 4 64 9R EXIT FORCE 

From a Job to the monitor: Exchange to the monitor 

program. A hardware branch is then taken to the 
address defined by the right-most 48 bits of the 
monitor's register S. For this case, the R, S and T 
designators are undefined and must be set to zero. 

From the monitor to a Job: Exchange to the job 

whose invisible package is located starting at the 
absolute bit address contained in register T. For 
this case, the R designator is undefined and must be 
set to zero. If either the S designator or the 
contents of register S are equal to zero, the Job's 
register file and the monitor's register file are 
identi ca I . 



{CONTROL OATA I 

I j 

! Corporation I 


ENGINEERING 


NO. 10354636 
DATE Dec. 1977 

SPECIFICATION PAGE 83 

REV. A 

P A 0 L 

3.2.1.11 OA 4 64 HN TRANSMIT (R) TO MONITOR INTERVAL 

TIMER 

When in monitor mode, transmit bits 40 through 63 of 
64 -bit register R to the monitor interval timer 
(see Section 3»1*8). The left-most 40 bits of 
register R are ignored. The S and T designators are 
undefined and must be set to zero. 


3.2.1.12 

OB 

ILLEGAL 

3.2.1.13 

QC 

ILLEGAL 

3.2.1.14 

00 

ILLEGiAL 
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3.2.1.15 


OE 4 64 MN TRANSLATE EXTERNAL INTERRUPT (A9.0) 


Each bit in the external interrupt register (EIR) is 
associated with an external I/O channel^ or the 
monitor interval timer. 


External Interrupt Register Bit 


Assi qnment 


0 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 


I/O Channel 0 

A 1 

I 2 

I 3 

I 4 

1 5 

I 6 

I 7 

I 8 

I 9 

I lO 

J 11 

I 12 

1 13 

V 14 

I/O. Channe I 15 
Monitor Interval 
Timer 


Translate the lowest numbered bit set in the EIR 
into its associated four-bit code and transmit this 
code to the right-most four bits of register T. The 
left-most 60 bits of register T are cleared to zero. 

Examine the EIR and it only one bit is set* the 
branch condition is met. The branch* if taken, is 
to (S) + (R) where (S) is an index in half-words and 
(R) is the base address. 

The exit* be it a branch or not, clears the fait (and 
only that bit) in the EIR corresponding to the 
channel designator which was transmitted to 
register T. 

If the T and S designators are equal* the 
interrupting channel designator will also be 
the branch index. 

Bit zero of the EIR will never be set as it is 
reserved for maintenance purposes. 

If no bit in the EIR is set* this instruction sets 
T to a I I zeros and no branch is taken. 
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3. 2.1 . ±6 OF ILLEGAL 

3.2.1.17 10 A 64 RG CONVERT BCD TO BINARY, FIXED LENGTH 

Convert the oacked BCD number in register R to a 
signed (two's complement) binary number and place 
the result into the right-most 48 bits of register 
T. The conversion is undefined for binary results 
47 47 

greater than 2 -1 or less than (2 - 1 )? thus the 

largest decimal number that may be converted is 
±140,737,488,355,327. The ASCII/EBCDIC sign code 
for the BCD number is in bits 6Q~63 of register R. 

Data flag bit 39 wi 1 I be set for numbers outside 
this range. 

If the inout number is not a valid BCD number, the 
results are undefined. Bits 0“15 of register T will 
be cleared to zero. 

3.2.1.18 11 A 64 RG CONVERT BINARY TO BCD, FIXED LENGTH 

Convert the right-most 48 bits (two's complement 
binary number) of register R to a packed BCD number 
and place the result in register T. The result is 
a number having 15 digits {4 bits per digit plus the 
sign in th lower bits - bits 60-63). The binary 
47 

range is ± (2 - 1 ). During Job mode, the sign bits 

generated are conditioned by the ASCII/EBCDIC bit 
in the job's invisible package. During monitor mode, 
only ASCII codes will be generated. 

3.2.1.19 12 7 64 NT LOAD BYTE; (T) PER (S), (R) 

3.2.1.20 13 7 64 NT STORE BYTE; (T) PER (S), (R) 

Load/store a byte from/into the address specified by 
(R) + (S), where (R) is the base address and (S) is 
an item count of bytes, into/from bits 56 through 63 
of register- T. The remaining bits of T are cleared 
on a load and ignored on a store. 

3.2.1.21 14 ILLEGAL 


3.2.1.22 


15 


ILLEGAL 
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3.2.1. 23 16 ILLEGAL 

3.2.1.24 17 ILLEGAL 

3.2.1.25 18 ILLEGAL 

3.2.1.26 19 ILLEGAL 

3.2.1.27 lA ILLEGAL 

3.2.1.28 IB ILLEGAL 

3.2.1.29 IC ILLEGAL 

3.2.1.30 lO ILLEGAL 

3.2.1.31 lE ILLEGAL 

3.2.1.32 IF ILLEGAL 

3.2.1.33 20 7 64 RG SHIFT (R) AND (R+l) PER S TO (T) AND 

(T+l) 

This instruction shifts the 128 -bit operand formed by 
catenating the contents of register R and register 
R+l (bit 0 of register R+i follows bit 63 of register 
R) and stores the results into the register 
designated by T and the next sequential register 
(T+i). The S designator specifies the type and 
amount of shift. If the S designator is in the 
range from 0 through 7F (0 through 127 )» the 

16 10 

128 -bit operand is shifted left end-around the 
specified number of places. If the S designator is 
in the range from FF through 8l (-1 through -127 )» 

16 10 

the 128 -bit operand is shifted right with sign 
extension. For this case, bit zero of the operand 
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3.2.1.33 (Cont.) 

from register R is considered to be the sign bit of 
the shifted operand. The number of right shifts is 
equa} to the two's comojement of the S designator. 

If for examole, S Is equal to FE , the operand 

16 

shifts right two places. If the S designator is 
greater that 7F or less than 31 , the results 

16 16 

of this instruction are undefined. The R designator 
must specify an even register number. If the R 
designator is equal to zero* register zero will 
provide machine zero. This instruction does not 
test for machine zero or indefinite or set any data 
flags, 

3.2.1.34 21 7 64 RG SHIFT (R) AND (R+D PER (S) TO (T) 

AMD (T+l) 

This instruction shifts the 128-bit operand formed by 
catenating the contents of register R and register 
R+1 (bit 0 of register R+l follows bit 63 of register 
R) and stores the results into the register 
designated by T and the next sequential register 
(T+l). The contents of the register designated by S 
determine the type and' amount of shift. If the 
right-most byte of register S is in the range from Q 
through 7F (0 through 127 )» the i28-bit 

16 10 

operand is shifted left end-around the soecified 
number of places. If the right-most byte of 
register S is in the range from FF through 8l 

16 16 

(-1 through -127 the 123-bit operand is shifted 

10 

right with sign extension. For this case, bit zero 
of the operand from register R is considered to be 
the sign bit of the shifted operand. The number of 
right shifts is equal to the two's complement of the 
right-most byte of register S. If the right-most 
byte of register S is greater than 7F or less 

16 

than 8i , the results of this instruction are 
16 

undefined. The left-most seven bytes of register S 
are ignored. 

The R designator must specify an even register 
number. If the R designator is equal to zero, 
register zero will provide machine zero. This 
instruction does not cause a test for machine zero or 
indefinite or set any data flags. 



ICONTROL 

1 

1 ■ 

I Corpora 

DATA 1 

1 

ti on 1 

E N 
S P E 

G I N E 
C I F I 

D A n 

ERIN 
C A T I 

G NO. 10354636 

DATE Dec. 1977 
0 N PAGE 88 

REV. A 




K A u 



3.2*1.35 

22 

ILLEGAL 




3.2.1.36 

23 

ILLEGAL 




3.2.1.37 

24 

ILLEGAL 




3.2.1.38 

25 

ILLEGAL 



I^HODUCIBILEry OF Tm 

3.2.1.39 

26 

ILLEGAL 



ORIGINAL PAGE IS POORi_ 

3.2.1.40 

27 

ILLEGAL 




3.2.1.41 

28 

ILLEGAL 




3.2.1.42 

29 

ILLEGAL 




3.2.1.43 

2A 

ILLEGAL 




3.2.1.44 

23 

4 64 RG 

ADD TO 

LENGTH 

FIELD 


Add 

bits 00 through 15 

of register R to bits 48 


through 63 of 

S and store the 

result in bits 00 


through ±5 of 

register 

T. Bits 16 through 63 of 


register R are 

! transferred to 

bits 16 through 63 of 


register T. 




3.2.1.45 

2C 

4 64 RG 

LOGICAL 

EXCLUSIVE OR (R),(S), TO (T) 

3.2.1.46 

2D 

4 64 RG 

LOGICAL 

AND 

(R),(S), TO (T) 

3.2.1.47 

2E 

4 64 RG 

LOGICAL 

INCLUSIVE OR.(R),(S), TO(T) 


These instructions perform the indicated logical 
functions listed below. The function occurs bit by 
bit on the 64-bit operands contained in the registers 
designated by R and S. The result in each case is 
stored in the register designated by T . 


R 

s 

EXCLUSIVE OR 
R.S 

AND 

R.S 

INCLUSIVE OR 
R.S 

0 

0 

0 

0 

.0 

0 

0 

1 

0 

1 ■ 

1 

0 

1 

0 

1 

1 

1 

0 

1 

1 


If' the R or S designators equal 2 ero» register zero 
will contain machine zero. 
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2F 9 1 BR REGISTER BIT BRANCH AND ALTER 

This instruction examines bit 63 of register T. As 
specified by the G designator a branch is made to 
the address contained in the right-most 48 bits of 
register S. The branch is made according to G bits 
0 and 1 as follows? 

“GO Gi 

0 0 do not branch 

0 1 branch unconditionally 

1 0 branch if the object bit was a one 

1 1 branch if the object bit was a zero 


After the branch decision has been made and 
regardless of what that decision was» the object bit 
is altered according to G bits 2 and 3 as follows: 

G2 G3 

0 0 do not alter the object bit 

0 1 toggle the object bit to the other 

state 

1 0 set the object bit to a one 

1 i clear the object bit to a zero. 
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3.2.1.49 30 7 64 RG SHIFT (R) PER S TO (T) 

This instruction shifts the 64-bit operand from the 
register designated by R and stores the result into 
the register designated by T. The S designator 
specifies the type and amount of the shift. If the 
S designator is in the range from c through 3F (0 

1 & 

through 63 the operand from register R is shifted 

IQ 

left end-around the specified number of places and 
then stored in register T. If the S designator is 
in the range from FF through Ci (-1 through 

16 16 

-63 )» the operand from register T is shifted 

10 

right with sign extension and then stored into 
register T. For this case* bit 2ero of the operand . 
from register R is considered to be the sign bit of 
the shitted operand. The number of right shifts is 
eaua I to the two's complement of the S designator. 

If for example, S. is equal to FE , the operand 

16 

from register R shifts right two places. If the 
S designator is greater than 3F or less than 

16 

Cl , the results of this instruction are -undef ined. 
16 

If the R designator is equal to zero* register zero 
will provide machine zero. This instruction does not 
test for machine zero or Indefinite or set any data 
f I ags. 


3.2.1.50 31 7 64 BR INCREASE(R) AND BRANCH IF(R) <> 0 

Increment the contents of the right-most 48 bits of 
register R by one. The uoper 16 bits of register R 
are not altered and arithmetic overflow is ignored. 

If the result from above is 48 zeros, go to the next 
sequential instruction. If the 48-bit result from 
above is non-zero, branch to (S) + (T) where (S) is 
an item count of half-words and (T) is the base 
address. The resulting address for the branch is 
undefined if the R designator is equal to either the 
S designator or the T designator. 
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3.2.1.51 32 9 1 8R BIT BRANCH AND ALTER 

Register S contains the address of the ob)ect bit. 
This instruction reads up the word containing the 
object bit and examines the bit. The branch is then 
made according to G bits 0 and i: 

GO Gl 

0 0 do not branch 

0 1 branch unconditionally 

1 0 branch if the object bit was a one 

1 1 branch if the object bit was a zero 

After the branch decision has been made and 
regardless of what that decision was. the object bit 
is altered according to G bits 2 and 3 as foMowss 

G2 G3 

0 0 do not alter the object bit 

0 1 toggle the object bit to the other 

state 

1 0 set the object bit to a one 

1 i clear the object bit to a zero 

NOTEi If GO and Gc and G3 = 0» do not reference the 
object bit at all 

If (Go = 1 ) and (G2 and G3 = 0) read, but do 
not write the object bit 

G bit 5=0 Register T contains the branch 
address 


G bit 5 = H Branch address is formed by 

I- adding the T designator, used as 
G bit 6 = ill a half-word item count to the 
program address register 

G bit 5 = 11 Branch address is formed by 

I- subtracting the T designator. 

G bit 6. = il used as a half-word item count. 

from the program address register 
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3.2.1.52 33 8 1 BR DATA FLAG REGISTER BIT BRANCH AND ALTER 

I is a six-bit designator specifying an object bit in 
the data flag register. 

The object bit in the data flag register is examined 
and the decision to branch is made according to G 
bits 0 and i. 

GO Gi 

0 0 do" not branch 

0 1 branch unconditionally 

1 0 branch if the. object bit was a one 

1 1 branch if the object bit was a zero 

After the branch decision has been made and 
regardless of what that decision was, the object bit 
is altered according to G bits 2 and 3 as foMowsS 

G2 G3 

0 0 do not alter the object bit 

0 1 toggle the object bit to the other 

state 

1 0 set the object bit to a one 

1 1 clear the object bit to a zero 

Programmer Note: It is meaningless to try to alter 
bits in the product field (bits 0-15) since the 
product field is strictly a function of the 
approoriate data flag and flag mask bits. 

Since the 33 Instruction begins execution without 
waiting until the machine has completed ail 
operations, the data flag bits may set on any minor 
cycle during execution of the 33 instruction. 
Therefore, the object bit is sampled 2 minor cycles 
after the 33 Instruction is loaded into IRQ. This 
sampled object bit, rather than the actual object 
bit, is used to control the decision to branch, and 
the altering of the actual object bit in the data 
flag register. Consequently, any data flag bits 
that set after the object bit is sampled will not 
affect the decision to branchy Also, if the sampled 
object bit is a zero, any data flag bits that set 
afterwards will not be cleared nor toggled to a zero* 


(cont inued) 
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3. 2»1.52 (Cont. ) 

G bit 5=0 Register T contains the branch 
address. 

G bit 5 = 15 Branch address is forcned by 

I- adding the T designator, used as 
G bit 6 = ill an item count, in half-words, to 
the program address register 

G bit 5 = ii Branch address is formed by 

5- subtracting the T designator, 

G bit 6 = il used as an item count, in 

half-words, from the program 
address register. 

3.2.1.53 34 4 64 RG SHIFKR) PER (S) TO (T) 

This instruction shifts the 64 -bit operand from the 
register designated by R and stores the result into 
the register designated by T. The register 
designated by S specifies the type and amount of the 
shift. If the right-most byte of register S is in the 
the range from 0 through 3F (0 through 63 )» 

16 10 

the ooerand from register R is shifted left 
end-around the specified number of places and then 
stored into register T. If the right-most byte of 
register S is in the range from FF through Cl 

16 16 

(-1 through -63 )t The operand from register R is 
10 

shifted right with sign extension and then stored 
into register T. For this case, bit zero of the 
operand from register R is considered to be the sign 
bit of the shifted operand. The number of right 
shifts is equal to the two*s complement of the 
right-most byte of register S. If the right-most byte 
of register S is greater than 3F or less than Cl , 

16 16 

the results of this instruction are undefined. 

The left-most seven bytes of register S are ignored. 

If the R designator is equal to zero, register zero 
will provide machine zero. This instruction does not 
cause a test for machine zero or indefinite or set 
any data f I ags . 
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3.2.1.54 35 ' 7 64 3R DECREASE (R) AND BRANCH IF (R) <> Q 

Decrement the contents of the right-most 48 bits of 
register R by one. The upper 16 bits of register R 
are not altered and arithmetic overflow is ignored. 

If the result from above is 48 zeros, go to, the next 
sequential instruction. If the 48-bit result from 
above is non-zero, branch to (S) + (T) where (S) is 
an item count of half-words and (T) is the base 
address. The resulting address for the branch is 
undefined if the R designator is equal to either, 
the S designator or the T designator. 

3.2.1.55 36 7 64 SR BRANCH AND SET(R)T0 NEXT INSTRUCTION 

After storing the address of the next sequential 
instruction into register R, branch to fS) + (T) 
where <S) is an item count of half-words and {T> is 
the base address. Bits 0 through 15 of register R are 
forced to zeros. Bits 59 through 63 of register R are 
undefined. If the R designator is equal to the S 
designator the results of this instruction are 
undef i ned . 

note: If S=B, and R=T, this instruction sets 

register R to the half-word address of the next 
instruction and the program continues at the next 
instruction. This is a way to sample the program 
address register (P) . 

3.2.1.56 37 A 64 NT TRANSMIT JOB INTERVAL TIMER TO (T) 

Transmit the contents of the job interval timer into 
bits 40“63 of register T. Bits 0-39 are cleared to 
zero. The R and S designators are undefined and must 
be set to zero. This instruction does not deactivate 
the time. 

When executed in monitor mode, the operation of this 
instruction is undefined. 

3.2.1.57 38 A 64 IN TRANSMIT {R BITS 00-15) TO (T SITS 

00-15) 

Replace the left-most 16 bits of register T with the 
left-most 16 bits of register R, 
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3.2.1.58 39 A 64 NT TRANSHIT REAL-TIME CLOCK TO{T> 

Transmit the contents of the real-time clock to bits 
16 through 63 of register T. Sits 00 through 1-5 are 
cleared. R and S must be zero. 

3.2.1.59 3A A 6L NT TRANSMIT < R) TO JOB INTERVAL TIMER 

When executed in Job mode* this instruction transmits 
bits 40 through 63 of 6A-bit register R to the Job 
interval timer. S and T must be zero, (See Sections 
3 . 1 . 6 . 3 an d 3*1. 8. 3) . 

When executed in monitor mode* this instruction 
performs as a no oo. 

3.2.1.60 38 A 64 BR DATA FLAG REGISTER LOAO/STORE 

Transfer the contents of register R to the data flag 
register and the original contents of the data flag 
register to register T. The S designator is 
undefined and must be set to zero. The R and T 
designators may be the same and this will swao data 
flag packages. 

NOTE; An immediate data flag branch results at the 
termination of this instruction if the new 
contents of the data flag register meet the 
aopropriate conditions, 

3.-2. 1.61 3C 4 32 NT HALF-WORD INDEX MULTIPL Y (R) ^ (S ) TO (T) 

The right-most 24 bits of registers R and S contain 
signed, two's complement integers. Their product is 
formed and stored into the right-most 24 bits of 
register T, The left-most 8 bits of register T are 
cleared to zeros. 

If the product or either operand exceeds the value, 

23 

+(2 -1) the result is undefined. 
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3.2.1.62 3D 4 64 NT INDEX MULTIPLY (R)»(S) TO (T) 

The right-most 48 bits of registers R and S contain 
signed, two’s complement integers. Their product is 
formed and stored into the right-most 48 bits of 
register T. The left-most 16 bits of register T are 
cl eared to zeros. 

If the product or either operand exceeds the value, 
±(2 “1) the result is undefined. 

3.2.1.63 3E 6 64 IN ENTER(R) WITH I Cl6 BITS) 

Clear register R and transfer the right-most 16 bits 
of this instruction to the right-most 48 bits of 
register R (the sign of the i6-bit immediate operand 
is extended through bit 16). 

3.2.1.64 ' 3F 6 64 IN INCREASE(R) BY I (16 BITS) 

Replace the right-most 43 bits of register R by the 
sum of those bits and the right-most i& bits of this 
instruction (the sign of the I6~bit Immediate operand 
is extended through bit i& for the addition). 
Arithmetic overflow is ignored. 


3.2.1.65 

40 

4 

32 

RG 

ADD 

u? 

(R)+(S) 

TO 

(T) 

3.2.1.66 

41 

4 

32 

RG 

ADD 

l; 

(R)+(S) 

TO 

(T) 

3.2.1.67 

42 

4 

32 

RG 

ADD 

N ; 

(R)+(S) 

TO 

(T) 

3.2.1.68 

43 




ILLEGAL 




3.2.1.69 

44 

4 

32 

RG 

SUB 

u; 

(R)-(S) 

TO 

(T) 

3.2.1.70 

45 

4 

32 

RG 

SUB 

l; 

(R)-(S> 

TO 

(T) 

3.2.1.71 

46 

4 

32 

RG 

SUB 

N? 

CR)-(S) 

TO 

(T) 

3.2.1. 72 

47 




ILLEGAL 




3.2.1.73 

48 

4 

32 

RG 

MPY 

u; 

(R)»( S» 

TO 

(T) 

3.2 .1.74 

49 

4 

32 

RG 

MPY 

l; 

(R)*(S) 

TO 

(T) 

3.2.1.75 

4A 




ILLEGAL 




3.2.1.76 

48 

4 

32 

RG 

MPY 

s; 

(R)»(S) 

TO 

(T) 

3.2.1.77 

4C 

4 

32 

RG 

DIV 

u; 

(R)/(S) 

TO 

(T) 


These instructions perform the indicated floating- 
point arithmetic operation on the 32”bit floating- 
point operands contained in the registers designated 
by R and S. The result in each case is stored in the 
register designated by T. 


(continued) 
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3.2.1.77 (Cont.) 

U signifies that the upper result of the operation is 
returned; L signifies the lower r'esuit; S signifies 
the significant result; and N signifies The 
normalized upper result. 

Data flags: bits 42» 43 and 46 

3.2.1.78 40 6 32 IN HALF-WORD ENTER R WITH 1(16 BITS) 

Clear register R and transfer the right-most 16 bits 
of this instruction to the right-most 24 bits of 
register R (the sign of the 16 -bit immediate operand 
is extended through bit 8). 

3-.2.1.79 4E 6 32 IN HALF-WORD INCREASE R BY 1(16 BITS) 

Replace the right-most 24 bits of register R by the 
sum of those bits and the right-most 16 bits of this 
instruction (the sign of the 16”bit immediate operand 
is extended through bit 8 for the addition). 
Arithmetic overflow is ignored. 

3.2.1.80 4F 4 32 RG OIV S; (R)/(S) TO (T) 

This instruction performs a divide significant 
operation on the 32-bit floating-point operands 
contained in the registers designated by R and S. 

The result is stored in the register designated by T. 

Data flags: bits 41^ 42» 43 and 46 

3.2.1.81 50 A 32 RG TRUNCATE; (R) TO (T) 

Transmit to destination register T the nearest 
integer whose magnitude is less than or eoual to the 
32-bit floating-point operand in origin register R. 
This integer is represented as an unnormalized 32~bit 
floating-point number having a positive exponent. 

If the exponent of the source operand is positive 
(greater than or equal to zero)» the operand is 
transmitted directly to the destination register. 


(continued) 
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3.2»1» 81 (Cont. ) 

If the exponent of the source ooerand is negative, 
the magnitude of the coefficient is shifted right end 
off, and the exponent is increased by one for each 
bit position shifted until the exponent becomes zero. 
Zeros are extended on the left during the shift. If 
the coefficient of the source operand is positive, 
the shifted coefficient with zero exponent is entered 
into the destination register. If the coefficient of 
the source ooerand is negative, the two's complement 
of the shifted coefficient with zero exponent is 
entered into the destination register. 

If machine zero is used as an operand, 32 zeros are 
returned as a result. 

Data f tag! bit 46 

3.2.1.82 51 A 32 RG FLOOR? (R) TO (T) 

Transmit to destination register T the nearest 
integer less than or equal to the 3E“b it floating- 
point operand in origin register R. This integer is 
represented as an unnorma I i zed 32-bit floating-point 
numbei; having a positive exponent. 

If- the source operand's exponent is positive (greater 
than or equal to zero), the operand is transmitted 
directly to the destination register. 

If the exponent of the source operand is negative, 
the coefficient is shifted right end off and the 
exponent is increased by one for each bit position 
shifted until the exponent becomes zero. Sign bits 
are extended on the left during the shift. The 
shifted coefficient with zero exponent is entered 
into the destination register. 

If machine zero is used as an ooerand, 32 zeros are 
returned as a result. 

Data flag! bit 46 

3.2.1.83 52 A 32 RG CEILING?{R) TO (T) 

Transmit to destination register T the nearest 
integer greater than or equal to the 32“bit floating- 
point operand in origin register R. This integer is 
represented as an unnormalized 32-bit floating-point 
number having a positive exponent. 


(continued) 
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3.2.1.83 (Cent.) 

If the source -operand ’ s exponent is positive (greater 
than or eaual to zero) » the operand is transmitted 
directly to the destination register. 

If the exDonent of the source operand is negative, 
the two's complement of the coefficient is shifted 
right end off and the exponent is increased by one 
for each bit position shifted until the exponent 
becomes zero. Sign bits are extended on the left 
during the shift. The two's complement of the shifted 
coefficient with zero exponent is entered into the 
destination register. 

If machine zero is used as an operand, 32 zeros are 
returned as a result. 

Data flagJ bit 46 


3.2.1.84 53 A 32 RG SIGNIFICANT SQUARE ROOT? (R> TO (T) 

Transmit to 32”bit register T the square root of a 
32 ^bit floating-point operand in register R. 

Data flagsJ bits 43 » 45 and 46 

3.2.1.85 54 4 32 RG ADJUST SIGNIFICANCE! ( R) PER (S) TO 

(T) 

Adjust the significance of the floating-point operand 
in register R and transmit it to result register T, 

A signed, two's complement, integer is contained in 
the right-most 24 bits of register S. The absolute 
value of this integer is a shift count. 

If the shift count is positive, shift the operand's 
coefficient left the number of Places specified by 
the shift count or by the number of shifts needed to 
normalize the coefficient, whichever is smaller. In 
either case, the exponent of the operand is reduced 
py one for each place actually shifted. An all zero 
coefficient will be shifted left the number of Places 
specif ied . 


(continued ) 
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3.2.1.85 (Cont.) 

If the shift count is negative* shift the operand’s 
coefficient right the number of places specified by 
the shift count and increase the exponent of the 
operand by one for each place shifted. If R is 
indefinite, T will be indefinite and data flag bit 
46 is set. If R is machine zero, T wil I be machine 
zero and data flag bit 43 will be set. 

This instruction is undefined if the absolute value 
of the shift count is greater than 23 • Note that 

10 

the addition of the shift count can cause either 
exponent overflow or exponent underflow. 

Data flags: bits 42, 43 and 46 


3.2.1.86 55 4 32 RG ADJUST EXPONENT? (R) PER (S) TO (T ) 

Transmit the adjusted operand from register R to 
result register T. The exponent of the result is set 
equal to the exponent of the operand in register S. 
The coefficient of the result is formed by shifting 
the coefficient of the operand from register R. 

The shift count used is the difference between the 
exponents in registers R and S, If the exponent in 
register R is greater/less than the exponent in 
register S, the shift is to the left/right, 
respectively. For zero coefficients in register R, 
the exponent from register S is copied to register 
T with an all-zero coefficient. 

If a left shift exceeds the number of places required 
for normalization, the result is set to indefinite, 
and data flag bit 42 is set. If either- or both 
operands are indefinite or machine zero, the result 
is set to indefinite. In this case, data flag bit 46 
is set and data flag bit 42 is not set. 


Data flags 


bits 42 and 46 
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3.2.1.87 56 7 64 SM BSWAPJ R — >S or S — >T 

Move data from the Backing Store (specified by source 
field R) to Main Memory (specified by S) » or move - 
data from Main Memory to the Backing Store (specified 
by T). When moving data from the Backing Store to 
Main Memory the T field must be zero? when moving 
data from Main Memory to the Backing Store the R 
field must be zero. 

Bits 8 to 15 of register S specify the number of 
blocks to be transferred from one memory to another? 
one block is 32»768 64 -bit words. Only an integral 
number of blocks may be transferred. A value of -zero 
for transfer length makes the instruction a No op. 

The maximum length transfer is 055 blocks. 

Bits 0 to 7» 16 to 34* and 43 to 63 of register S are 
unused. Bits 35 to 42 specify the block base address 
for the start of the transfer from/to Main Memory, 

Bits 0 to 13? 16 to 09, and 43 to 63 of registers R 
and T are unused. Bits 14 and 15 are used only in 
monitor mode to manipulate the backing store block 
busy f lags. A one in bit 14 means the blocks of 
Backing Store accessed by the 3SWAP instruction will 
remain busy after completion of the swap (normally 
the Backing Store blocks are made busy at the issue 
of the BSWAP instruction and the busy flags are 
cleared block by block as the data transfer 
completes, A one in bit 15 is essentially an override 
of busy flags for blocks accessed by the BSWAP 
instruction. This allows monitor mode to lock down 
blocks by making (and leaving) them busy, yet allows 
monitor access to them. In job mode bits 14 and 15 
are not used and a BSWAP proceeds as though both bits 
were zero -- blocks must not be busy at the start and 
are left not busy at completion. 

Bits 30 to 42 of registers R and T specify the block 
base address for the start of the transfer from or to 
the Backing Store, respectively. 

If the length of the transfer is such that the Swap 
Unit attempts to read or write oast the end of 
either memory, instruction results are undefined. 
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3.2*1»S7 (Cont.) 

Examples of 56 useagef 

Register lO --- . 0 Oi 0 1 0 OOOl 00 0 0 0 0 0 

Register 11 0 0 0 0 I 00020 00 0 0 0 0 0 

56111000 Move 10 blocKs of data from the 

16 

Backing Store beginning at address 
200000000 to Main Memory beginning 
at address IDQOOOOO. 

56001011 Move 10 blocks of data from Main 

16 

Memory beginning at address lOOOOOOO 
to the Backing Store beginning at 
address 200000000. 


3.2«1.88 ' 57 ILLEGAL 

3.2.1.89 58 A 32 RG TRANSMIT? (R) ‘tD (T) 

Transmit the operand In 32-bIt register R to 32“blt 
register T. 

3.2.1.90 59 A 32 RG ABSOLUTE? (R) TO (T) 

Transmit the absolute value of the 32-bit floating- 
point operand in register R to register T. 

3.2.1.91 5A A 32 RG EXP.? (R) TO (T) . 

Transmit the exponent from the left-most 8 bit 
positions of. the origin register R to the right-most 
8 bit positions of destination register T. The sign 
of the exponent is extended through bit 8 of 
■ destination register T» the left-most 8 bits of the 
destination register are cleared to zeros. 


3.2.1.92 5B 4 32 RG PACK? (R), (S) TO (T) 

Transmit a 32-bit floating-point number to the 
destination register T, The exponent of the number 
is obtained from the right-most 8 bit positions of 
register R and the coefficient is obtained from the 
right-most 24 bit positions of register S. 
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3.2.1.93 5C A B RG EXTEND? 32-BIT(R) TO 64-BIT(T) 

Extend the f 1 oat i ng-oo int number from 32“bit register 
R into a 64 -bit floating-point number and transmit 
the result to 64-bit register T. The value of the 
resulting i6-bit exponent is 24 less than that of 
the origin operand’s exponent. The coefficient is 
obtained by transmitting the right-most 24- bits of 
the origin register into bits 16 through 39 of the 
destination register. The right-most 24 bits of the 
destination register are cleared to zero. 

If R is indefinite, T will be indefinite and data 
flag bit 46 wilt be set. If R is machine zero, T 
will be machine .-zero and data flag bit 43 wilt be set. 

Data flagJ bit 43 and 46 

3.2.1.94 50 A 8 RG INDEX EXTEND? 32-BIT(R) TO 64-BIT(T) 

Extend the floating-point number from 32-bit register 
R into a 64-bit floating-point number and transmit 
the result to 64-bit register T. The value of the 
resulting 16-bit exponent is the same as the origin 
operand’s exoonent. The coefficient is obtained by 
transmitting the right-most 24 bits of the origin 
register into bits 40 through 63 of the destination 
register. Bits 16 through 39 of the destination 
register are set to the sign of the origin 
coe fficient. 

If R is indefinite, T will be indefinite and data 
flag bit 46 will be set. If R is machine zero, T 
will be machine zero and data flag bit 43 will be set. 

Data flag! bit 43 and 46 


3.2.1.95 5E 7 32 NT LOAD? (T) PER (S), (R) 

3.2.1.96 5F 7 32 NT STORE? (T) PER (S), (R) 

Load/store 32-bit register T from/into the address 
soecified by (R) + (S> where (R) is the base address 
and (S) is an item count of half-words. Note that S 
and R are 64-bit registers and that the item count 
is shifted left five places before the addition. 
Overflow from this addition is ignored, if it occurs. 


\l 

1 
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3.2.1.97 

60 

4 

64 

RG 

ADD 

u; 

(R)+(S) 

TO 

(T) 

3.2.1.98 

61 

4 

64 

RG 

ADD 

l; 

(R)+(S> 

TO 

(T) 

3.2.1.99 

62 

4 

64 

RG 

ADD 

n; 

(R)+(S) 

TO 

(T) 


These instructions perform the indicated floating- 
point arithmetic operation on the 64**bit floating- 
point operands contained in the registers designated 
by R and S. The result in each case is stored in the 
register designated by T. 

U signifies that the upper result of the operation is 
returned; L signifies the lower result; and N 
signifies the normalized upper result. 

Data flags* bits 42t 43 and 46 

3.2.1.100 63 4 64 RG ADD ADDRESS; (R)+(S) TO (T) 

This instruction adds bits 16 through 63 of register 
R to bits 16 through 63 of register S and stores the 
result in bits 16 through 63 of register T. Bits 16 
through 63 are treated as 48-bit» positive, unsigned 
integers. Arithmetic overflow is ignored. Bits 0 
through 15 of register R are transferred without 
modification to bits 0 through 15 of register T. 


3.2.1.101 

64 

4 

64 

RG 

SUB 

u; 

(R)-(S> 

TO 

<T) 

3.2.1.102 

65 

4 

64 

RG 

SUB 

l; 

(R)-(S) 

TO 

fT) 

3.2.1.103 

66 

4 

64 

RG 

SUB 

n; 

(R)-{S) 

TO 

(T) 


These instructions perform the indicated floating- 
point arithmetic operation on the 64-bit floating- 
point operands contained in the registers designated 
by R and S. The result in each case is stored in the 
register designated by T. 

(J signifies that the upper result of the operation is 
returned; L signifies the lower result; and N 
signifies the normalized upper result. 

Data flags: bits 42 » 43 an 46 

3. 2. 1.104 67 4 64 RG SUB ADDRESS; (R)-(S) TO (T) 

This instruction subtracts bits 16 through 63 of 
register S from bits 15 through 63 of register R and 
stores the result in bits 16 through 63 of register 
T. Bits 16 through 63 are treated as 48-bit, positive 
unsigned integers. Arithmetic overflow is ignored. 
Bits 0 through 15 of register R are transferred 
without modification to bits 0 through 15 of 
register T. 
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3.2.1.105 

68 

4 

64 

RG 

MPY 

u; 

{R)*(S) 

TO 

(T) 

3.2.1.106 

69 

4 

64 

RG 

MPY 

l; 

(R)*{S) 

TO 

(T) 

3.2.1.107 

6A 



ILLEGAL 





3.2 .1.108 

68 

4 

64 

RG 

MPY 

s: 

(R)»fS) 

TO 

<T) 

3.2.1.109 

6C 

4 

64 

RG 

DIV 

u; 

(R)/(S) 

TO 

(T) 


These instructions perform the indicated ffoating- 
point arithmetic operation on the 64'bit floating- 
point operands contained in the registers designated 
by R and S. The result in each case is stored in the 
register designated by T. 

U signifies that the upper result of the operation is 
returned; L signifies the lower result? S signifies 
the- significant result. 

Data flags: bits 41» 42* 43 and 46 

3.2.1.110 60 4 64 RG INSERT BITS? (R) TO fT) PER (S) 

This instruction inserts the right-most bits of the 
register designated by R into the register 
designated by T. 


Reg R 


Reg T 


Reg S 


1 




1 m 

I 

1 




1 < 

— > J 

1 




1 bits 

1 





I 





INSERT 

V 




f 






V 




1 

I I 

m 

1 


1 

1 

» 1 <-- 


->l 

* 

1 

I 

1 

1 1 

bits 

I 


I 


A 

1 


* These bits 

are unaltered 



bit n 






III I I 

I 0 0 I m I 0 0 I n I 

ill II 


0 9 10 15 16 57 58 63 
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3. 2.1. 110 (Cont.) 

Sits 10 through 15 of register S contain the number 
,{m) of right-most bits to be inserted. The right-most 
6 bits of register S soecify the the bit number (n) 
in register T where the leftmost bit of the inserted 
data Mill be placed.. Bits 0 through 9 and 16 through 
57 of register S are undefined and must be set to 
zero . 

If the R designator is equal to zero, then register 
zero will provide machine zero. If m plus n is 
greater than 64 , or if m is equal to zero, the 

10 

results of this Instruction are undefined. 

3.2.1.111 6E 4 64 RG EXTRACT BITS? (R) TO (T ) PER (S) 

This instruction extracts bits from register R and 
stores them into the right-most portion of register 
T. Register T is cleared before receiving the 
extracted bits. 


I 


I 


A I 

I V EXTRACT 

bit n ^ 

I 

V 


I I I m I 

Reg R I •’< >1 

I I I bits I 


I 

Reg T I 0 

I 


I m 1 

0 I 1 

I bits 1 


It! I i 

Reg S I 0 0 I m I 0 0 I n 1 

111 II 


0 9 10 15 16 57 58 63 
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3.2.1.111 (Cent.) 

Bits 10 through 15 of register S contain the number 
(m) of bits to be extracted from register R. The 
right-most 6 bits of register S specify the left-most 
bit number of the extracted bits. Bits 0 through 9 
and 16 through 57 of register S are undefined and 
must be set to zero. 

If the R designator is equal to zero, register zero 
will provide machine zero. If m plus n is greater 
than 64 , or i f in is equal to zero, the results of 

10 

this instruction are undefined. 

3.2.1.112- 6F 4 64 RG OIV S? {R)/(S) TO (T) 

This instruction performs a Divide Significant 
operation on the 64-bit floating-point operands 
contained in the registers designated by R and S. 

The result is stored in the register designated by T. 

Data flags: bits 41» 42. 43 and 46 


3.2.1.113 70 A 64 RG TRUNCATE; (R) TO CT) 

Transm.it to destination register T the nearest 
integer whose magnitude is less than or equal to the 
magnitude of the 64 -bit floating-point operand in 
origin register R. The integer is represented as an 
unnormalized 64-bit f 1 oating-oo int number having a 
positive exponent. 

If The exponent of the source operand is positive 
(greater than or equal to zero), the operand is 
transmitted directly to the destination register. 

If the exponent of the source operand is negative, 
the magnitude of the coefficient is shifted right 
end off and the exponent is increased by one for each 
bit position shifted until the exponent becomes zero. 
Zeros are extended on the left during the shift. If 
the coefficient of the source operand is positive, 
the shifted coefficient with zero exponent is entered 
into the destination register. If the coefficient of 
the source ooerand is negative, the two's complement 
of the shifted coefficient with zero exponent is 
entered into the destination register. 


( cont inued) 
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3.2.1.113 (Cont.) 

If a machine zero is used as an operand, 64 zeros are 
returned as a result. 

Data flag; bit 46 

3.2.1.11A 71 A 64 RG FLOOR? (R) TO (T) 

Transmit to destination register T the nearest 
integer less than or equal to the 64-bit floating- 
point operand in origin register R. This integer is 
represented as an unnorma I i zed 64-bit floating-point 
number having a positive exponent. 

If the source operand's exponent is positive (greater 
than or equal to zero), the operand is transmitted 
directly to the destination register. 

If the exponent of the source operand is negative, 
the coefficient is shifted right end off and the 
exponent is increased by one for each bit position 
shifted until the exponent becomes zero. Sign bits 
are extended on the left. during the shift. The 
shifted coefficient with zero exponent is entered 
into the destination register. 

If a machine zero is used as an operand, 64 zeros are' 
returned as a result. 

Data flag; bit 46 

3.2.1.115 72 A 64 RG CEILING? (R) TO (T) 

Transmit to destination register T the nearest 
integer greater than or equal to the 64-bit floating- 
point operand in origin register R. This integer is 
represented as an unnormalized 64-bit floating-point 
number having a positive exponent. 

If the source operand's exponent is positive (greater 
than or equal to zero), the operand is transmitted 
directly to the destination register. 

If the exponent of the source operand is negative, 
the two's complement of the coefficient is shifted 
right end off and the exponent is increased by one 
for each bit position shifted until the exponent 
becomes zero. Sign bits are extended on the left 
during the shift. The two's complement of the shifted 
coefficient with zero exponent is entered into the 
destination register. 


(continued) 
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3.2.1.115 (Cont.) 

If 'nachlne zero is used as an ooerand» 64 zeros are 
returned as a result. 

Data flag: bit 46 . 

3.2.1.116 73 A 64 RG SIGNIFICANT SQUARE ROOT? (R) TO (T) 

Transmit to register T the square root of the 64-bit 
floating-point operand in register R. 

Data fl'ags: bits 43. 45 and 46 

3.2.1.117 74 4 64. RG ADJUST SIGNIFICANCE? (R) PER (S) TO (T) 

Ad]ust the significance of the floating-point operand 
in register R and transmit it to result register T'. 

A signed, two's complement integer is contained in 
the right-most 48 bits of register S. The absolute 
value of this integer is a shift count. The left- 
most 16 bits of register S are ignored. 

If the shift count is positive, shift the operand's 
coefficient left the number of places specified by 
the shift count or by the number of shifts needed to 
normalize the coefficient, whichever is smaller. In 
either case, the exponent of the operand is reduced 
oy one for each place actually shifted. An all zero 
coefficient wil I be shifted left the number of places 
specif ied. 

If the shift count is negative, shift the operand's 
coefficient right the number of places specified by 
the shift count and increase the exponent of the 
operand by one for each place shifted. 

This instruction is undefined if the absolute value 
of the shift count is greater than 47 . Note that 

10 

the addition of shift count can cause either exponent 
overflow or exponent underflow. 

If R is indefinite, T will be definite and data flag 
bit 46 will be set. If R is machine zero. T will be 
machine zero and data flag bit 43 will' be set. 


Da.ta flags-: bits 42. 43 and 46 



ENGINEERING 


ICONTROL DATA 1 

i i 

1 Corporstlon » 


SPECIF 


I C A T I 0 N 


NO. 10354636 
DATE Dec. ±977 
PAGE 110 
REV. A 


R A D L 


3.2.1.11B 75 4 64 RG ADJUST EXPONENT? (R) PER (S) TO (T) 

Transmit the adjusted operand from register R to - 
result register T» The exponent of the result is set 
equal to the exponent of the operand in register S. 
The result is formed by shifting the coefficient of 
the operand from register R. 

The shift count used is the difference between the 
exponents is register R and S. If the exponent in 
register-R is greater/less than the exponent in 
register S* the shift is to the left/right* 
respectively* For zero coefficients in register R, 
the exponent from register S is copied to register T 
with an all-zero coefficient. 


If a left shift exceeds the number of places 
required for norma I ization, the result is set to 
indefinite and data flag 42 is set. If either or 
both operands are indefinite or machine zero» the 
result is set to indefinite. In this case» data flag 
bit 46 is set and data flag bit 42 is not set. 
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3.2.1.119 76 A B RG CONTRACT? 64-BIT (R) TO 32-BIT (T) 


Contract the 64-bit floating-point number from 
register R into a 32-bit- floating-point number and 
transmit the result to 32-bit register T. 


7FFF 

s 

7000 


Resu I t 

Result Indefinite 
Indefinite Data Flag 46 


6FFF 

• 

• 

0058 

Result Indefinite 
Data Flag 42 » 46 

0057 

Result exponent 24 larger 

• 

10 

» 

« 

than input exponent 

• 

« 

Copy left-most 24 bits of 

• 

m 

FF78 

input coefficient 


FF77 

« 

8000 


Result machine zero 
Data Flag 43 


The 24 -bit result coefficient is copied from the 
left-most 24 bits of the 48 -bit source coefficient 
(bits 16 through 39). This has the effect of 
contracting all negative source coe f fi dents ♦ whose 
absolute values (neglecting the exponent) were less 

24 

than or equal to 2 5 to a minus one. 

Data flags? bits 42 » 43 and 46 
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3.2.1.120 77 A B RG ROUNDED CONTRACT? 64-BIT (R) TO 32-BIT 

(T) 

Perform a rounded contract operation on the 64-bit 
floating-point number in register R and transmit the 
32-bit floating-point result to 32-bit register T. 

A positive one is added to the origin operand in bit 
position 40. If overflow occurs the exponent is 
increased by one and the coefficient is shifted right 
one place. The left-most 24 bits of this 48-bit sum 
are then transmitted to the 24-bit coefficient ' 
portion of register T. Each non-endcase result 
element's 6-bit exponent is 24 (25 is overflow 

10 10 

occurred) greater than the correspondi ng source 
element's exponent. 

Data flags: bits 42» 43 and 46 

3.2.1.121 78 A 64 RG TRANSMIT? (R) TO (T) 

Transmit the 64-bit operand In register R to 
register T. 

3.2.1.122 79 A 64 RG ABSOLUTE? (R) TO (T) 

Transmit the absolute value of the 64-bit floating- 
point operand in register R to register T. 

Data flags: bits 42» 43 and 46 

3.2.1.123 7A A 64 RG EXP.? (R) TO (T) 

Transmit the exponent from the left-most 16 bit 
positions of origin register R to the right-most 
16 bit positions of destination register T. The sign 
of the exponent is extended through bit 16 of 
destination register T. The left-most 16 bits of the 
destination register are cleared to zeros. 

3.2.1.124 78 4 64 RG PACK? (R), (S) TO (T) 

Transmit a 64-blt floating-point number to 
destination register T. The exponent of the number 
is obtained from the right-most 16 bit -positions of 
register R, and the coefficient is obtained from the 
right-most 48 bit positions of register S. 
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3.2.1.125 7C A 64 RG LENGTH-? (R) TO (T) 

/ 

Transmit the ieft-most 16 bit oositions of origin 
register R to the right-most. 16 bit positions of 
destination register T. The left-most 48 bits of the 
destination register are cleared to zeros. 

3.2.1.126 70 7 64 NT SWAP? S >T AND R >S 

Move to destination field T» a oortion of the 
Register File beginning at the 64-bit register 
soecified by the right-most eight bits of register S. 
Transmit source field R to the Register File 
beginning at the 64-bit register specified by the 
right-most eight bits of register S. 

The left-most i& bits of register R and T specify 
the field length in words for the source and 
destination fields-, respectively. The field lengths 
of the source and destination fields may be different 
but each must be even. A zero field length indicates 
no transfer for that field. Any transfer of words 
into or out of the Register File that becomes 
exhausted of registers (i.e., beyond the bounds of 
the Register File), causes the instruction to become 
unde fined. 

The right-most 48 bits of registers R and T specify 
the base address of the source and destination 
fields, respectively. These addresses must specify 
an even 64-bit word in Main Memory. Bits 57 
through 63 of register R and T are undefined and must 
be set to zero. Overlap of the source and destination 
fields is allowed only if the base addresses for both 
fields are equa I . 

Registers R, S, or T , may be in the range of the 
registers being swapped. 

The starting register in the file soecified by the 
right-most eight bits of register S must be an even 
register or this instruction will be treated as an 
undefined instruction. For additional material see 
Section 3 . 1*7 on the Register File. 


REPlODUCIBiLnY OF THE 
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3.2.1.127 7E 7 64 NT LOAD; (T) PER (S), (R) 

3.2.1.128 7F 7 64 NT STORE; (T) PER (S>, (R) 

Load/store 64 ”bit register T from/into the address 
specified by (R) + (S) where (R) is the base address 
and (S) Is an item count of words. 


5.2.1.129 

80 

ILLEGAL 

3.2.1.130 

81 

ILLEGAL 

3.2.1.131 

82 

ILLEGAL 

3.2.1.132 

83 

ILLEGAL 

5.2.1.133 

84 

ILLEGAL 

3.2.1.13A 

85 

ILLEGAL 

3.2.1.135 

86 

ILLEGAL 

3.2.1.136 

87 

ILLEGAL 

3.2.1.137 

68 

ILLEGAL 

3. 2.1. 133 

89 

ILLEGAL 

3.2.1.139 

8A 

ILLEGAL 

3.2.1.140 

8B 

ILLEGAL 

3.2.1.141 

8C 

ILLEGAL 

3.2.1.142 

8D 

ILLEGAL 

3.2.1.143 

8E 

ILLEGAL 

3.2.1.144 

8F 

ILLEGAL 

3.2.1.145 

90 

ILLEGAL 

3. 2.1.146 

91 

ILLEGAL 

3.2.1.147 

92 

ILLEGAL 

3.2.1.148 

93 

ILLEGAL 

3.2.1.149 

94 

ILLEGAL 
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3.2.1.150 

95 

ILLEGAL 

3.2.1.151 

96 

ILLEGAL 

3.2.1.152 

97 

ILLEGAL 

3.2.1.153 

98 

ILLEGAL 

3.2.1.154 

99 

ILLEGAL 

3.2.1.155 

9A 

ILLEGAL 

3.2.1.156 

9B 

ILLEGAL 

3.2.1.157 

9C 

ILLEGAL 

3.2.1.158 

9D 

D E 


STREAM HAP 


REPRODUCIBILITY' OF THE 
ORIGINAL PAGE IS POOR 


Depending on the value of the four-bit subfunction A, 
the Mao Unit performs the functions described. 


Subformat 0i Register File Reference Mode (16-bit 
parcel ) 


Field Code 


Operation 


2 

3 

4 

5 

6 

7 

8 


READ 1 Setup 
READ 2 Setup 
READ 3 Setup 

WRITE 1 Setup (WlA) source 
from READ 1 

WRITE -i Setup (Wl3) source 
from READ 2 

WRITE 1 Setup (WlC) source 
from Comoress/MasK/Merge Net 
WRITE 1 Setup <WiD) source 
from Gather/Scatter Met 
WRITE 1 Setup (WlE) source 
from Vector Unit 


B 0 


Register File Reference Mode 
(Subformat Oi) 


C 


0 

1 


64-blt Mode 
32-bit Mode 
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3. Z. 1.158 (Cont.) 


0 {for A 


code 

0 

1 




2 

,3 


l-3> READ Setup) 

Extend indefinite on input vector 
Extend floating-point zero on 
input vector 

Extend floating-point one on 
inout vector 

Repeat input vector (for 
64“bit mode, lowest three 
bits of field length must be 
zero* foh 32-bit mode, lowest 
four bits of field length 
must be zero? if field length 
is zero, operand is broadcast) 


D (for A code = 4-8-* WRITE Setup) 

0 Order vector operates on ones 

1 Order vector operates on 

zeros 

Z, 3 Not defined 


E 


OO-FF Register file designator 


Subformat 02 Immediate Reference Mode <64-bit parcel) 


Fl.£.j.d Cod e 


Ooerat i on 


2 

3 

4 

5 

6 

7 

8 


READ 1 Setup 
READ 2 Setup 
READ 3 Setup 

WRITE 1 Setup (WiA) source 
from READ l 

WRITE 1 Setup (WiB) source 
from READ 2 

WRITE 1 Setup (WlC) source 
from Comoress/Mask/Merge Net 
WRITE 1 Setup (WiO) source 
from Gat her/Sca tter Net 
WRITE 1 Setup (WlE) source 
from Vector Unit 


B 1 


Immediate Reference Mode 
(Subformat D2) 


C 


0 

1 


64-bit Mode 
32-bit Mode 
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3.2.1.158 (Cont.) 

0 (for A code = i~ 
0 

1 

2 

3 

EEPBODOCIBILm OF ™ 

OEIGINAL PAGE IS POOB 

D (for A code = 4 - 
0 
1 

2» 3 
E 
F 
6 

Z 

Subformat O 4 Si, S2 

EJUld 

A 9 

8 , 0 

0 

1 

2 

3 

4 


READ Setup) 

Extend indefinite on inout 
vector 

Extend floating-point zero 
on input vector 
Extend floating-point one on 
input vector 

Repeat input vector (for 
64-bit mode, lowest three 
bits of field length must be 
zero; for 32-bit mode, lowest 
four bits of field length 
must be zero; if field length 
Is zero, operand is 
broadcast) . 

, WRITE Setup) 

Order vector operates on ones 
Order vector operates on 
zeros 

Not defined 

16-bit field length 

28-bit sword address 

Lower address bits (used for 
shift count) 

Unus ad 

Connection Setup (64-bit parcel) 
Ooerat ion 

Si, S2 Connection setup 

Destination code (B for 
Si, D for S2) 

No change 
Vector Unit 
Buffer Unit 

Both Vector Unit and Buffer 
Unit 

Internal to Map Unit 
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RAD 


3.2.1.158 

(Cont. ) 




C, E 

0 

1 

2 

3 

4 

5 


Source Code (C for Sl» E for 
S2> 

No change 

Rl 

R2 

Compress/Mask 

Merge 

Gather 


Subtormat 03 Map 

Unit 

Functions (16-bit parcel) 


Field Code Ooerat Ion 


A 0 No op 




A 



Map Unit functions 



B 



Clear 

B (For 

A 

code 


A only, otherwise ignored) 



1 



Gather 



2 



Scatter 



3 



Compress 



4 



Mask 



5 • 



Merge 



6 



Form Order Vector 

C (For 

A 

code 

rt 

A) 

C field bits specify the 

follow 

ing 

• 

« 






7 






2 

Z 

1 1 

order vector test greater than 



6 






2 


1» 

order, vector test not equal 



5 






2 . 


It 

32-bit operands (=0, 64 -bit) 



4 






2 


1» 

move records (=0, move words) 



3 






2 


It 

order vector operates on zeros 
(=0t on ones) 



2 

1 


0 



2 » 

2 

t 2 

not defined 
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3.2.1.158 (Cont.) 

C (For A code = B) C field bits each soeclfy a 
unit, to be cleared as follows: 

7 

2 READ 1 

6 

2 READ 2 

5 

2 READ 3 

4 

2 WRITE 1 

3 

2 Cotnpress/Mask/Merqe 

2 

2 Gather/Scatter 

1 

2 Si 

0 

2 S2 
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With a string of subfunction parcels taken from the 
previously listed functions^ one can perforin a number 
of operations within the Mao Unit, or link the Map 
Unit to the Vector Unit to provide operand streams or 
to take result operands from the Vector Unit. Some 
examples of this linkage are: 

Vector Transmit iO 64-bit words from memory 

16 

address iOOQOOQO to address lOOOQQO. 

Assume that the register file is setup as follows: 

Register 10= 00 ID I 0 00 0l 0 0 00 00 0 

Register 11= 0010 i 0 00 001000000 

and the instruction buffer holds the following: 

(All quantities In hexadecimal) 

Total command: 9D011 0104012 B026 

Header (indirect mode- for both vector references) 

9D01 9D directs the operation to the Mao Unit, 
the 1 indicates that one 32 -bit packet 
follows the header packet. 

First parcel: 1010 Function code i in first 

four bits indicates that this 
is a READ 1 setup. 

The next tour bits are zero, 
indicating that: 

B field=0 — indirect reference mode 
C field=fl -- 64-bit mode operation 
D field=0 -- Extend A field (if 

shorter than C field) 
with indefinites. 

The next 8 blts=io, the 
register designator pointing 
to register 10 . 

16 
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3.2.1.158 (Cont.) 

Second parcel; -4012 Function code 4 . in first four' 

bits indicates a VIRITE 1 
setup with source from READ l» 
C, 0 fields all zero 
indicating 64“bit mode, 
indirect reference to the 
register file, and extension 
mode is ignored, the 
remaining 8 bits contain the 
register designator 12 
pointing to the register 
containing the starting 
address and length of the 
output stream to be written 
from WRITE i. 

Third parcel; 6028 Function code 3 indicates 

CLEAR operation? 28 specifies 
that R3 is to be disconnected 
(no order vector) and 
Compress/Mask/Merge is to be 
cleared (from any previous 
operat ion) . 

The same function could be programmed using the 
instruction itself to contain the field lengths and 
base addresses; 

900480 28 180 00 Did 10 00 0000480 0 0 010010 000 0 0 

which could be broken down into the following fields; 

Header 9004 Function sent to Map Unit, 4 

32 “bit packets follow to 
describe the operation. 

First parcel I B028 Clear R3 and 

Compress/Mask/M er ge 
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Second parcels 1800001010000000 

First four bits = 1 meaning a READ i 
setup! next four bits = 8 meaning that 
this is an immediate instruction 
(memory address and length to come 
directly from the instruction 
Itself)i 64“bit mode, extend 
indefinites; the next 8 bits 
are unused for the 9D 
instructions; the next 16 
bits contain a 0010 or field 
length of 16 elements? the 
10 

remaining 32 bits contain the 

bit base address' of the source field. 


Third parcels 48 O 0 0 0l OOiO 0 00 0 0 

First four bits = 4 meaning a WRITE 1 
setup with source from , READ 1, the 
next four bits = 8, meaning that B=i 
or the memory address and length are 
contained in the instruction, the next 
eight bits are unused, and the 
remaining 48 bits contain the 16-bit 
field length and 32-bit base address 
of the destination field. , 


Note that one field could be described by an 
immediate parcel and the other by an indirect 
reference to the Register File thus a 
9D0210 1 O 48 O 00 010010 0 fl 0 0 0 would have an indirect 
reference to register 10 for the source field while 
the destination field address of lOOOQOO would be 
contained in the instruction itself. 


(continued) 
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A gather operation which would retrieve every fifth 
element from memory beginning at address iOOOOOOO 
would require an additional parcel to be sent to the 
Map Unit to direct the gather operation. In this 
case, let register 05 contains 0000 1 0 OOOOO 5 OOOOO and 
location 500000 contain: 0 00 0 I 0 0 00 0 00 00 0 05, Then 

the instruction would appear as: 

9OQ2101033057012Al00000a 

which can be broken down as follows 

header 9002 Send function to Map Unit, 2 

32-bit packets fol-low 

Parcel IS 1010 — Setup READ 1 with register ifl 

(base address of vector) 

Parcel 2* 3305 -- Setup READ 3 from register 5 

(pointer to increment), repeat 
vector (increment) 

Parcel 3S 7012 -- Setup WRITE i with source from 

Gather/Scatter Network, base 
address of output from register 
12 

Parcel 4 : AiOO -- A=functional control of Map Unit 

internal modules, 
l=Gather operation, 

00=64-bit elements 

Parcel 5 s QOOO -- No op to fill packet 
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3.2.1.159 9E D E SM BUFFER REAO/WRITE SETUP 


This instruction provides individual setup for each 
of four ports in the Suffer Unit. A buffer 
address and vector length can be provided for RBl 
and RB 2 (Read Buffer i and Read Buffer 2> end for 
W8l and WB2 (Write Buffer i and Write Buffer 2). 
Subfunctions are specified by the A field of the 
instruction’ subformats. 


Subformat Dl Register File Reference Mode (I6“bit 
parcel ) 


Field Code 


Ooer at i on 


1 

2 

3 

4 


No op 

Set up RBl Port 
Set up RB2 Port 
Set up WSl Port 
Set up WB2 Port 


8 0 


Register File Reference Mode 
(Subformat Dl) 


C 


0 

1 


64-bit Mode 
32 -bit Mode 


D (for write setup* 
0 
1 
2 
3 


W8l and H32 source) 

51 (Source 1) 

52 (Source 2) 

ARi (Arithmetic Result i) 
AR2 (Arithmetic Result 2> 


D (extension form 
and RB2 setup) 

0 

1 

2 

3 


for nonconformal vectors* RBi 

Extend this stream with 
inde f ini tes 

Extend this stream with 

machine zero 

Extend this stream with 

floating-point ones 

Repeat this stream from the 

beginning 


E 


00-FF Register Fife Designator (for 

base address and field length) 


(cont i nued) 
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Subformat 05 Immediate Reference Node (32-bit 
parcel ) 


Field Code 


Ooerat Ion 


A 


bbpboducibi^ 

original page is POOBU 


1 

2 

3 

4 

1 


Set up RBl Port 
Set UD RB2 Port 
Set up W3l Port 
Set up WB2 Port 

Imme di a te Reference Mode 
(Subformat 05) 


C 


0 

1 


6<*-bit Node 
32“bit Mode 


0 (for write setup, 
0 
1 
2 
3 


W8l and WB2 source) 

51 (Source 1) 

52 (Source 2) 

ARi (Arithmetic Result i) 
AR2 (Arithmetic Result 2) 


0 (extension form 
R82 set up) 

Q 

1 

2 

3 


for noncon f orma I vectors, RBl and 

Extend this stream with 
Inde f ini tes 

Extend this stream with 

machine zero 

Extend this stream with 

floating-point ones 

Repeat this stream from the 

beginning 


E 12-bit field length of vector in words (if C 
field is zero) and half-words (if C field is a 
one); the right-most three (or four) bits of the 
field length are ignored, thus all vector 
functions from and to the buffer operate on 
groups of 8 words (or 16 half-words). 


F 12-bit buffer base address in swords (eight-word 
groups). Thus a base address of 000 would 

16 

address words 0 through 7 of the vector and base 
address ool would reference words 8 through 15 
la 

of the buffer. 
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Base addresses taken from the Regi ster. Fi 1 e are 
bit addresses* however the low order nine bits 
are ignored, effectively making the address to 
the buffer a sword address. 

Example? to perform a simple add of two vectors 
contained in the buffer with the result returned to 
the buffer would require three buffer setup 
commands and a Vector Unit command. The buffer 
instructions would appear as follows? ‘ 

9E03 Instruction header for Buffer Unit 
setup, 3 packets to follow. 


Parcel l IBIOOOOO 


A field =1, Setup RBi 

B field =1, Immediate Reference 

C field =0, 64-bit Mode 

D field =0, Extend with indefinites 

E field =100 , number of similar 

16 

elements to be 
processed (field 
I ength) 

F .field =000 , elements start 

16 

at buffer address 000 

16 

(base address) 


Parcel 2? 28100100 


A field =2, Setup RB2 

B field = 1 , Immediate Mode 

C field =0, 64-bit Mode 

0 field =0, Extend stream with 

i ndef 1 ni tes 

E field =1Q0 , number of elements 
16 

to be processed (field 
I ength ) . 

F field =100 , elements start at 

16 

buffer address lOO 

16 

(base address) 
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parcel 3*. 3A10020Q A field =3» Setup W81 Port 

S field =1 j Immediate Mode 
C field =0? 64-bit Mode 
0 field =2 ♦ Select ARl bus for 
results from Vector 
Unit 

E field =l00 ^number of elements 
16 

to be stored in the 
buffer 

F field =200 , start at base 

16 

address 200 

16 

Thus the sequence 9E03 0 0 0 OlOiOO 0 OQ 2 3l0 OlO 03 AlO 020 0 would 
deliver two streams of data to the Vector Unit via 
R8l (Read Buffer l) and RB2 (Read Buffer 2) starting at 
addresses 000 and 100 i resoect ive I y , and continuing 
16 16 

for lOO elements. The results from the Vector Unit 
16 

would be stored into the buffer beginning at base address 
200 for 100 elements. Note that a no op (0000) was 
16 16 

inserted after 9E03 to fill out the header packet. 


EEPEOBUCIBILrrY OF THE 
ORIGINAL PAGE- -IS POOR 
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!.2.1»150 9F .E E SM Vector Arithmetic 

This 32-bit instruction controls the operations, and 
selection of input and output data busses, for the 
Vector Unit. Referring to instruction format E, the 
fields are defined as follows: 


Field Funct 1 on Codes and Meaning 

A Operation Code 9F Vector Arithmetic 


B 


Sufaoperation 

Code 

ARl-B?.J.g 

AE2_a.M^ 

(Suboperations 

00 

A 

C 

00-17 are 

01 

B 

D 

performed with 

02 

A + B 

C+D 

norma 1 ized 

03 

A+B 

C»D 

arithmetic) 

04 

A*8 

C+0 


05 

(A+B)»D 

A+B 


06 

( A+B>^(C*0) 

C+0 


07 

( A+8)»(C+0) 

A+B 


08 

(A+B)» (C+0) 

Expand 
32-bit C 
to 64 bi 


09 

A + 8+D 

A+B 


OA 

( A+B)+C»D 

C»0 


03 

(A+B)»C+D 

(A+B)»C 


oc 

{A»8) + (C*D) 

C+0 


OO 

A» (B+C*D) 

B+(C*D) 


OE 

(A+8)»D 

(A+B) *C 


OF 

( A*C)+D 

(A»C)+B 


10 

{A*8)+D 

C*0 

, ^ 

11 

(A*B)+D 

C+D 


12 

13 

14 

15 

16 
17 

DIVIDE 1 

DIVIDE 2 

Sum of 

products 

Product o f 

sums 

Sum 

Product 



18 

A+B Upper 
Sum 

C+0 Upoei 
Sum 


19 

A+B Lower 
Sum 

C+D Lowei 
Sum 
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3.2*1.160 (Cont.) 

Field Funct 1 on Codes and Meaning 


Code A£i_£us 


lA 

A+B Upper 

C-^0 Uoper 


Sum 

Product 

IB 

A-i-8 Lower 

C^D Lower 


Sum 

Product 

1C 

A^B Upper 

C^O Uooer 


Product 

Product 

10 

A"^S Lower 

C*D Lower 


Product 

Product 


CjDtE.F Source busses 
for A»8»CXD 
streams 


G Round/No round 


H,J Como I ement S^D 


K Nu I I f i e I d 

L»M Select result 

busses to buffer 

(L=AR1 select, 

M=AR2 select) 

N Write Bus i select 


0 Source 1 {Sl)\ 

1 Source 2 {S2)/from 

Mao 

Unit 

2 Read Buffer i (R3l)— •- 

3 Read Buffer 2 (RBz) — ! 

( 

I 

V 

from 

Buffer 

unit 

0 No round results 

1 Round r esu I ts 

0 No como ) ement 

1 Complement operands 

must be zero 

0 Do not se 1 ect 
arithmetic result to 
•buffer 

1 Select arithmetic 
resu It to buf fer 

0 No select 

1 Select ARl to Write 
Bus 1 

2 Select AR2 to Write 
Bus 1 

3 I I lega I 
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3.2.1.160 (Cont.) 

Examp i e 2 


The Vector Unit has four input busses and three 
output busses associated with it. 

The busses are: 

51 Source l from the Map Unit 

52 Source 2 from the Map Unit 

RBl Read Buffer 1 from the Buffer Unit 
RB2 Read Buffer 2 from the Buffer Unit 


ARi Arithmetic Result l —to the Buffer Unit 
AR2 Arithmetic Result- 2 — to the Buffer Unit 
wi WRITE 1 — to the Mao Unit 

(either output ARl or AR2 can be selected 
into the write bus trunk) 


The contents of the internal busses ARl and AR2 are 
defined by the suboperation field B in the 
instruction. A simple vector add utilizing data from 
Main Memory (via the Map Unit) and placing results 
back into Main Memory (via the Map Unit) requires a 
suDfunction code of 2» with source operands into A 
and B selected from Si and S2 respectively and with 
the ARl output sent to Wl. The resulting instruction 
would appear as: 


9F021i0i with the fhelds broken down as 
foil ows : 

A=9F Vector arithmetic 
B=02 Select A-J-B and C-fD to ARl and AR2 
respective I y 
C=0 A source from Si 

D=i 8 source from S2 

E=Q C source from Si 

F=i D source from S2 


(C and D are unused in this ' 
example but necessary to activate 
the error checking) 


G » H » 

J=0. No complement, no rounding 
K=0 Must be zero 

L,M=0 Do not select ARl 

or AR2 to buffer ports 

N=i Select ARl (which will contain 

A+S) into WRITE 1 


(continued) 
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Note that identical input selections into the A»C and 
B,0 operands, and identical functions A+B and G+D , 
cause an automatic checking of the two adder outputs. 
In addition, the outputs of the multipliers, though 
idle, are compared for error checking during the 
processing of the addition operations. 


3.2.1.161 

AO 

ILLEGAL 

3.2.1.162 

A1 

illegal 

3.2.1.153 

A2 

ILLEGAL 

3.2.1.16A 

A3 

ILLEGAL 

3.2.1.165 

A4 

ILLEGAL 

3.2.1.166 

A5 

ILLEGAL 

3.2.1.167 

A6 

ILLEGAL 

3.2.1.168 

A7 

ILLEGAL 

3.2.1.169 

A8 

ILLEGAL 

3.2.1.170 

A9 

ILLEGAL 


3.2.1.171 AA ILLEGAL 
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3-2. 1.172 AS ILLEGAL 


3.2.1.173 AC ILLEGAL 


3.2.1.17A AO ILLEGAL 


3.2.1.17E AE ILLEGAL 


3.2.1.176 AF ILLEGAL 


3.2.1.177 

BO 

C 

E 

BR 

COMPARE- INTEGER, 
+ (X) EQ (Z) 

BRANCH 

IF 

(A) 

3.2.1.178 

81 

C 

E 

BR 

COMPARE INTEGER, 
+ (X) J^E (Z) 

BRANCH 

IF 

(A) 

3.2.1.179 

82 

C 

E 

BR 

COMPARE INTEGER, 
+ (X) GE (Z) 

BRANCH 

IF 

(A) 

3.2.1.180 

83 

C 

E 

BR 

COMPARE INTEGER, 
+ (X) LT (Z) 

BRANCH 

IF 

(A) 

3.2.1.181 

B4 

C 

E 

BR 

COMPARE INTEGER, 
+ (X) LE (Z) 

BRANCH 

IF 

(A) 

3.2.1.182 

85 

C 

E 

BR 

COMPARE INTEGER, 
+ (X) GT (Z) 

BRANCH 

IF 

(A) 


If bit 0 of the G designator is cleared/set, 
registers A, X» C and Z are 64/32 bits respectively. 
Registers B and Y are always 64 bits. 

G bits 1 and 2 must be set to zero. 
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3.2.1.182 (Cont.) 

These instructions are executed in the following 5 
steos ; 

1 . Form the sum of the 48 -bit <24~bit it G bit 0 = 

1) integers from the right-most portion of 
registers A and X, ignoring overflows. If 
designators A and/or X equal zero, machine zero 
wilt be suDp lied, 

2i Read register Z. If the 2 designator is equal to 
zero comoare against 48 zeros (24 zeros if G bit 
0 = 1> may be made. 

3« Store the following in register C! 

o The sum from step l is stored into the 

right-most 48 bits (24 bits if G bit 0 = i) of 
register C. 

o The left-most 16 bits (8 bits if G bit 0=1) 
of register A are copied into the left-most 
portion of register C» 

4 . Compare the sum formed in step 1 with register 2 
as f o I I ows 5 

o G bit 3=0 The integers compared are the 

48-bit (2-4 bits if G bit 0 = 1) 
result of step 1 and the 
right-most 48 bits (24 bits if G 
bit 0=1) read from register Z 
in step 2. 

o G bit 3=1 The integers compared are the 64 

bits that are stored into 
register C in steo 3 and 64 bits 
read from register Z in steo 2. 

This compare is defined only for 
the 80 and 8l instructions (EQ 
and NE-) . 

When both G bit 0 and G bit 3 are 
1 the instructions are undefined. 

(cont inued) 
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o G bit 4=0 The integers compared are 

interpreted as signed two's 
complement numbers. 

o G bit 4=1 The Integers compared are 

interpreted as unsigned numbers. 

The following table indicates the ordering of numbers 
from largest to smal lest as control led by G bit 4. 


10 I 1 


I I I 


1 Largest 

1 7F — 


- FF 

1 FF --- 



- FF 

I 1 

1 7F — 


- FE 

I FF 

— 

- FE 

1 1 

1 

« 


1 

♦ 


I 1 

1 

< 

• 


1 



1 1 

1 



1 

• 


1 1 
1 1 

1 00 -- 


- 01 

1 8 0 - — 


- 01 

1 1 

1 00 — 


- 00 

1 8 0 --- 

— 

- 00 

1 1 
1 1 

1 FF — 


- FF 

1 7F - — 



- FF 


I ' 

• 


1 

« 


1 t 

1 1 

1 

• 


1 

1 

• 


1 V 

I 

« 


1 

• 


1 Sma 1 1 est 

1 80 -- 

* 

- 01 

I 00 — 


- 01 


1 80 — 


- 00 

1 00 - — 


- 00 


5. • If the specified compare condition i_s met the 
instruction performs as follows^ 

o G bit 5 = 0 Branch to the address formed by 

adding the half-word item count 
from register Y left shifted 5 
places to the base address from 
register 8. 
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o G bit 5=1 Branch to the address formed by 

adding (G bit 6 = Q). or 
subtracting (G bit 6=1) the 
half-word item counts from the B 
and Y designators (16 bits), 
left shifted 5 places, to the 
program address of this 
instruct ion. 


If the specified compare condition j,s not met, the 
instructions will continue execution at the next 
sequential instruction. 

If any of the following conditions occur, the 
operation of these instructions is undefined, 

o G bit 0=1 and G bit 3=1 

0 G bit 3=1 for 82, 83, 84 and B5 

0 G bit 5=0 and G bit 6=1 

The CDC FHP has exoanded caoabi titles for the BO 
through 85 instructions implemented by means of G 
bit 0 through 3 combinations. 


80 

C 

E 

NT 

COMPARE 

INTEGER, 

SET 

CONDITION 

IF 

(A ) 

+ 

(X) 

EQ 

( 2 ) 

81 

C 

E 

NT 

COMPARE 

INTEGER, 

SET 

CONDITION 

IF 

(A ) 

+ 

(X) 

NE 

(Z), 

92 

c 

E 

NT 

COMPARE 

INTEGER, 

SET 

CONDITION 

IF 

(A) 

+ 

(X) 

GE 

(Z) 

83 

c 

E 

NT 

COMPARE 

INTEGER, 

SET 

CONDITION 

IF 

(A) 

+ 

(X) 

LT 

(Z) 

84 

c 

E 

NT 

COMPARE 

INTEGER, 

SET 

CONDITION 

IF 

(A ) 

+ 

(X) 

LE 

(Z) 

35 

c 

E 

NT 

COMPARE 

INTEGER, 

SET 

CONDITION 

IF 

(A) 

+ 

(X) 

GT 

(Z) 


If bit 0 of the G designator is cleared/set, 
registers A, X, Y, C and Z are 64/32 bits 
respectively. Register 8 is not used and must be 
set to zero, 

G bit 1=0 and G bit 2=1 

These instructions are executed in 5 steos of which 
the first four (compare) steps are identical to the 
first four steps described for 80 through B 5 
instructions with G bits 1 and 2 eoual to zero 
(compare branch) 
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If the specified compare condition met the 

instruction performs as follows: 

Store into register Y and 64-bit quantity 

(32-bit if G bit 0=1) ODO QOi and 

continue execution at the next sequential 
instruct ion. 

If the specified compare condition Xs not mett the 
instruction oerforms as follows: 

Store into register Y and 64-bit quantity 

(32-bit if G bit 0 = 1) 000 000 and 

continue execution at the next sequential 
instruction. 


If any of the following conditions occur? the 
operation of these instructions is undefined: 

o G bit 0=1 and G bit 3=1 

o G bit 3 '= 1 for B2» 83? 84 and B5 


o G bit 5 = 1? G bit 6 = 1 or G bit 7=1 





0 

The C designator is 

equal 

to 

the 

Z 

des i gnator 

80 

C 

E 

BR 

COMPARE F.P., 

BRANCH 

IF 

(A) 

+ 

(X) 

EQ 

(X) 

31 

C 

E 

8R 

COMPARE F.P. , 

BRANCH 

IF 

(A) 

+ 

(X) 

NE 

(X) 

82 

c 

E 

BR 

COMPARE F.P.? 

BRANCH 

IF 

(A) 

+ 

(X) 

GE 

(X) 

83 

c 

E 

8R 

COMPARE F.P. , 

BRANCH 

IF 

(A) 

+ 

(X) 

LT 

(X) 

84 

c 

E 

BR 

COMPARE F.P.? 

BRANCH 

IF 

(A) 


(X> 

LE 

(X) 

85 

c 

E 

BR 

COMPARE F.P. , 

BRANCH 

IF 

(A) 

+ 

(X) 

GT 

(X) 




If 

bit 0 of the 

G designator 

is 

c 1 eare d/set ? 


registers A and X are 64/32 bits respect i ve I y . 
Registers B and Y are always 64 bits. Registers C 
and Z are not used and must be set to zero. 

G bit 1=1 and G bit 2=0 

These Instructions compare the two f I o at ing~point 
operands from registers A and X according to the 
floating-point compare rules in Section 3.1*4»5« 
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If the specified compare condition met» the 
instructions perform as follows! 

o G bit 5=0 Branch to the address formed by 

adding the half-word item count from 
register Y, left shifted 5 places, 
to the base address from register B. 

o G bit 5=1 Branch to the address formed by 

adding (G bit 5 = o) or subtracting 
(G bit 6=1) the half-word item 
counts from the 8 and Y designators 
16 bits), left shifted 5 pi’aces, to 
the program address of this 
instruction . 

If the specified compare condition Is. not met, the 
instructions will continue execution at the next 
sequential instruction. 

If any of the following conditions occur, the 
operation of these instructions is undefined: 

o G bit 3 = 1, G bit 4 =1 or G bit 7=1 

o Designator Z and/or C not equal to zero 

o G bit 5=0 and G bit 6=1 


Data Flag! bit 46» 


BO 

C 

E 

NT 

COMPARE 

F.P, 

SET 

CONDITION 

IF 

(A) 

+ 

(X) 

EQ 

(Z) 

B1 

c 

E 

NT 

COMPARE 

F.P, 

SET 

CONDITION 

IF 

< A) 

+ 

(X) 

NE 

(Z) 

32 

c 

E 

NT 

COMPARE 

F.P, 

SET 

CONDITION 

IF 

CA) 


(X) 

GE 

(Z) 

B3 

c 

E 

NT 

COMPARE 

F.P, 

SET 

CONDITION 

IF 

(A) 

+ 

fX) 

LT 

(Z) 

B4 

c 

E 

NT 

COMPARE 

F.P, 

SET 

CONDITION 

IF 

(A) 

+ 

(X) 

LE 

(Z) 

35 

c 

E 

NT 

COMPARE 

F.P, 

SET 

CONDITION 

IF 

(A) 

+ 

(X) 

GT 

(Z) 


If bit 0 of the G designator is cleared/set, 
registers A, X, and Y are 64/32 bits respectively. 
Registers B, C and Z are not used and must be set to 
zero . 
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G bit 1=1 and G bit 2=1 

These instructions compare the two floating-point 
operands from registers A and X according to the 
floating-point compare rules in Section 3. 1.4. 5. 

If the specified compare condition xs met the 
instruction performs as foMowsz 

Store into register Y and 64 *-bit quantity 

(32-bit if G bit 0 = i) 000 000 and continue 

execution at the next sequential instruction. 

If the specified compare condition not met* the 
instruction performs as follows: 

Store into register Y the 64-bit quantity 
(32-bit if G bit 0=1) 000---001 and continue 
execution at the next sequential instruction. 


If any of the following conditions occur* the 
operation of these instructions is undefined: 

o Any one of G bits 3 through -7 is set 

o Designators B* Z and/or C are not equal to zero 
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3.2.1.133 66 5 NA BR BRANCH TO IMMEDIATE ADDRESS? (R) +I ( 48 

-BITS) 


The right-most 48 bits of register R contain an item 
count of ha!f-words. The right-most 48 bits of the 
instruction word contain an immediate operand which 
is used as a base address. An unconditional branch 
is taken to the branch address formed by adding the 
item count to the base address (the item count is 
shifted ,ieft 5 places before the addition and 
overflow, if any, is ignored). 

A direct branch is taken to the base address from the 
instruction word if the R designator is zero or if 
the right-most 43 bits of register R are zeros. 


3.2.1.184 87 ILLEGAL 

3.2.1.135 38 ILLEGAL 

3.2.1.136 39 ILLEGAL 

3.2.1.187 BA ILLEGAL 

3.2.1.188 88 ILLEGAL 

3.2.1.139 BC ILLEGAL 

3.2.1.190 8D ILLEGAL 

3.2.1.191 BE 5 64 IN ENTER (R) WITH K48 BITS) 

Clear register R and transfer the right-most 48 bits 
of this instruction to the right-most 48 bits of 
register R, 
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3.2.1.192 BF 5 64 IN INCREASE (R) BY 1(48 BITS) 

Replace the right-most 48 bits of register R by the 
sum of those bits and the right-most 48 bits of this 
instruction word. Arithmetic overflow is ignored. 


3.2.1.193 

CO 

ILLEGAL 

3.2.1.194 

Cl 

ILLEGAL 

3.2.1.195 

C 2 

ILLEGAL 

3.2.1.196 

C3 

ILLEGAL 

3.2.1.197 

C4 

ILLEGAL 

3.2.1.198 

C5 

ILLEGAL 

3.2.1.199 

C 6 

ILLEGAL 

3.2 . 1.200 

C7 

ILLEGAL 

3.2.1.201 

C 8 

ILLEGAL 

3.2.1.202 

C9 

ILLEGAL 

3.2.1.203 

CA 

ILLEGAL 


3.2.1.204 CB ILLEGAL 
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3.2.1.205 

CC 

ILLEGAL 

3.2.1.206 

CO 

5 32 IN HALF-WORD ENTER (R) WITH 1(24 BITS) 


Clear register R and transfer the right-most 24 bits 
of this instruction to the right-most 24 bits of 
register R. 

3.2.1.207 

CE 

5 32 IN HALF-WORD INCREASE (R) BY 1(24 BITS) 


Replace the right-most 24 bits of register R by the 
sum of those bits and the righ.t-most 24 bits of this 
instruction word. Arithmetic overflow is ignored. 

3.2.1.208 

CF 

ILLEGAL 

3.2.1.209 

OQ 

ILLEGAL 

3.2.1.210 

01 

ILLEGAL 

3.2.1.211 

02 

ILLEGAL 

3.2.1.212 

□3 

ILLEGAL 

3.2.1.213 

D4 

ILLEGAL 

3.2.1.214 

05 

ILLEGAL 

3.2.1.215 

D6 

ILLEGAL 

3.2.1.216 

□ 7 

ILLEGAL 

3.2-.1 .217 

08 

ILLEGAL 
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3.2.1.218 

09 

ILLEGAL 

3.2.1.219 

DA 

ILLEGAL 

3.2.1.220 

OB 

ILLEGAL 

3.2.1.221 

DC 

ILLEGAL 

3.2.1.222 

OD 

ILLEGAL 
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3.2.1.223 

OE 

ILLEGAL 

3.2.1.224 

OF 

ILLEGAL 

3.2.1.225 

EO 

ILLEGAL 

3.2.1.226 

El 

ILLEGAL 

3.2.1.227 

£2 

illegal 

3.2.1.228 

E3 

ILLEGAL 

3.2.1.229 

E4 

ILLEGAL 

3.2.1.230 

E5 

illegal 

3.2.1.231 

£6 

ILLEGAL 

3.2.1.232 

E7 

ILLEGAL 

3.2.1.233 

ES 

ILLEGAL 

3.2.1. 234 

E9 

ILLEGAL 

3.2.1.235 

EA 

ILLEGAL 

3.2.1.236 

£8 

illegal 

3.2.1.237 

EC 

ILLEGAL 

3.2.1.238 

ED 

ILLEGAL 

3.2.1.239 

EE 

ILLEGAL 


5.2.1.240 EF ILLEGAL 
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3.2.1.241 

FO 

ILLEGAL 

3.2.1.242 

Fl 

ILLEGAL 

3.2.1.243 

F2 

ILLEGAL 

3.2.1.244 

F3 

ILLEGAL 

3.2.1.245 

F4 

ILLEGAL 

3.2.1.246 

F5 

ILLEGAL 

S. 2. 1.247 

F6 

ILLEGAL 

3.2.1.248 

F7 

ILLEGAL 

3.2.1.249 

F8 

ILLEGAL 

3.2.1.250 

F9 

ILLEGAL 

3.2.1.251 

FA 

ILLEGAL 

3.2.1.252 

fb 

ILLEGAL 

3.2.1.253 

FC 

ILLEGAL 

3.2.1.254 

FO 

ILLEGAL 

3.2.1.255 

FE 

ILLEGAL 

3.2.1.256 

FF 

ILLEGAL 
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3»2»2 Instruction Execution Times 

Instruction execution times are to be inc!uded in the 
appropriate machine description specifications. See 
Section 2.0. 


4.0 TEST REQUIREMENTS (not apolicable) 

5.0 PREPARATION FOR DELIVERY (not applicable) 

6.0 NOTES 

6.1 ASCII/EBCDIC Reference Charts 

The following table defines the control characters used in the 
ASCII Reference Chart. 


INUL Nul 1 

iSOH Start of Heading (CC) 

ISTX Start of Text (CC) 

lETX End of Text (CC) 

lEOT End of Transmission (CC) 

lENQ Enquiry (CC) 

lACK AcKnowledge (CC) 

I3EL Bell (audible or 
1 attention signal) 

IBS Backspace (FE) 

IHT Horizontal Tabulation 
I (punched card skip (FE) 

ILF Line Feed (FE) 


1 DLE 
1 

1 DCl 
1 

I0C2 

1 

10C3 

1 

Data Link Escape 

(CC) 

Device 

Contro 1 

1 


Device 

Contro 1 . 

2 


Device 

Contro 1 

3 


I 

J0C4 

1 

Device 

Contro 1 

4 

(Stop) 

1 

INAK 

1 

ISYN 

1 

1ET3 

Negative Acknowledge (CC) 

Synchronous Id) 

i e 

(CC) 

End of 

Transmission Block 

i 

1 

(CC) 




1 

ICAN 

1 

1 EM 
1 

Cancel 




End of 

Medium 



1 

ISU8 

Substitute 




NOTE: (CC) Communication Control 

(FE) Format Effector 
(IS) Information Separator 
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6.1 (Cont.) 


\ 


VT 

Vertical Tabulation (FE) 

lESC 

1 

•JFS 

I 

JGS 

1 

Escape' 


FF 

Form Feed (FE) 

File Separator 

(IS) 

CR 

Carriage Return (FE) 

Group Separator 

(IS) 

SO 

Shift Out 

1 

IRS 

1 

lUS 

Record Separator (IS) 

SI 

Shift In 

Unit Separator 

(IS) 



1 

i 



• 

IDEL 

Delete 



NOTE: (CC) Communication Control 

(FE) Format Effector 
(IS) Information Separator 


1 

In t'he strict sense, DEL is not a control character. 
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I.EGENO 


ASCII Character 
Card Code 



n-8-2 
I 5A 

1 — r 


CQCOIC EBCDIC 

Characier Code 

(Hoxadccimal) 


Si 

R Q 

fi' 

il 


1 

1 





t 

1 


1 o 


1 

1 

n 

1 o 


1 

1 

o 

1 z 


1 

1 

■> 

1 H 


1 

i 

•o 

1 70 


t 

1 

o 

1 o 


1 

1 

*> 

1 


1 

1 

(U 

1 


1 

1 

-+ 

1 u 


t 

t 

»-• 

1 > 


t 

1 

o 

t -H 


1 

1 

D 

1 > 


1 

1 


1 


t 

1 

1 

1 

t 




t 

1 

1 

1 

i 


00 



1 

« 


T3 

m 


1 

t 

1 


m 

z 


1 

( 

1 


o 

<n 


1 


M 

H 


?0 







■n 

z 


J> 







M 

m 


a 







O 

m 


r~ 





1 


J> 

70 


1 

1 

i 


-1 

M 


1 

1 


M 

z 


1 

1 


O 

<n 


1 

1 

1 

1 

1 


Z 



1 

1 

1 

1 

X) 

TJ 

o z 


> 

m 


> o 


1 

c 

<n 

H • 


1 

1 

« 

m 

m 


1 

x> 


O (-» 


1 



<0 ca 


1 


-'i 

O 6j 


t 



• U1 


1 



P 


1 



i-i. (ji 


i 



iV CM 


1 



-'4 on 


1 



-sj 
























































EXTENDED BINARY CODED DECIMAL INTERCHANGE CODE (EBCDIC) WITH PUNCHED CARD CODES AND ASCII TRANSLATION 



LEGEND EBCDIC Chirscln 


ASCII ASCII 

Chaiacw Code 

IHreadecimall 






— 

— 




1 

O 



o 

» 

o 



o 

1 

z 



■> 

( 

p 



t3 

1 

PC 



O 

1 

o 



T 

1 

r 



Q) 

1 




P 

( 

o 



*-• 

1 

T> 



O 

1 

H 



D 

1 





1 






(/} 

T) m 

m z 


o a 



> pa 


M 

M Z 

o n 

z 


pa V a z 
m j> 1 = o 
< IT) H • 

• mm 

H- O l-L 

4^ (0 o 

» o w 
* oi 
■p 
p- m 
vO OJ 
■s O' 
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EXTENDED BINARY CODED DECIMAL INTERCHANGE CODE 
(EBCDIC) WITH PUNCHED CARO COOES AND ASCII 
TRANSLATION 
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APPENOIX A 


Al. 0 


SCOPE 


The intent of this Appendix is to 
information regarding some of the 
the CDC FMP, Further information 
other appropriate specifications. 

Applicable Documents). 


provide additional 
char acteristlcs of 
can be found In 
(See Section 


A2,0 SELF-MODIFYING PROGRAMS 

The use of se I f-mod i fy ing programs is not allowed. 

The following rules which would have to be followed 

illustrate why this must be true. 

The following rules apply to all programs! 

1. The twenty-four 64 ~bit words before (having 
addresses lower than current instruction word) 
and the thirty-two 64-bit words after (having 
addresses higher than current instruction word) 
the current instruction word shall not be 
modified by the current instruction. 

2. The twenty-four instructions before (in terms of 
order of execution) and the thirty-two 
instructions after (in terms of order of 
execution) the c.urrent instruction word shall 
not be- modified by the current instruction. 

3. The store into Main Memory for the 13» 5F» 
and 7F instructions may not taKe place before 
the execution of the next instruction in 
sequence, Therefore» if these instructions are 
used to modify code, it is difficult to 
guarantee that the store has taken place before 
the execution of that code. There are three 
procedures to guarantee that the store has taken 
place prior to execution of the intended 
modified code, 

a. The execution of any instruction which 

references Main Memory with the exception 
of the I2t 13» 32, 5E , 5F, 7E and 7F 
instructions. These instructions must be 
executed between the store instruction which 
modifies the code and the use of that 
modified code. 


(cont inued) 
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b. The execution of the conditional branch 
feature of the 32 instruction between the 
store instruction which modifies the code and 
the use of that modified code. 

c. Execution of a load instruction (12» 5E* or 
7E) followed by a transmit ( 78 ) instruction 
where the source register for the 78 
instruction conflicts with the destination 
register for the load instruction. These 
instructions must be executed between the 
store instruction which modifies the code and 
the use of that modified code. 

The instructions referenced in a*, b. and c. 
abov>e must be executed from addresses at least 
four swords before or at least three swords after 
the modified code. 

A 3 .O INSTRUCTION STACK 

Each machine has a different size instruction stack 
thus program optimization must' be approached with 
different parameters. Further information is 
contained in the appropriate execution timing 
specif ication. 


Number of Words in Instruction Stack 


CDC 

STAR-lB 

1 

64-bit 

word 

CDC 

STAR-1 00 

32 

64-bit 

words 

CDC 

STAR-IOOA 

128 

32-bit 

wor ds 

CDC 

FMP 

128 

32-bit 

wor ds 


A4.0 N/A 

A 5 .O VECTOR FORMATS 

In the COC FMP, a vector is defined as a contiguous 

set of bits, bytes or floating-point operands. The 

contiguous set of bits or bytes i.s called a string, 
while the contiguous set of floating-point elements 
is cal led an array. 


<cont inued > 
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Operands are used in the following vector formats: 

Array - a counted, var iab I e- I ength, contiguous, 
floating-point operand field. Vector operations 
can be performed on defined fields consisting 
entirely of 32~bit operands or entirely of 64*"bit 
operands. 

Index List - a counted data array of integer 
values in floating-point format. 

A6.0 INVISIBLE PACKAGE 

A6 • 1 Contents of the Invisible PacKage 

The CDC FMP performs as specified with an addition. 
Bit 12 of word 8 contains the stall bit. The stal I 
bit is a "1** if no data was processed during the last 
job time-slice that resulted in the preparation of 
the invisible package, 

A&.2 Program Address Register 

The only requirement on the program address stored 
into the first location of the invisible package 
when program interruption has occurred is that the 
comouter be able to restart the Job from the same 
point at which the interruption occurred. 

The following table is included for information only. 
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1 

1 

Interrupt Condition 

I 

1 

Program Address Stored 1 
In Invisible PacKage 1 

1 

Exit force Instruc- 

1 

A + -20 1 

1 

tion at address A 

1 

16 1 

1 

in Job Mode. 

1 

I 

J 

Illegal instruction 

1 

A t 

J 

with function code 

1 

1 

1 

less than 80 at 

t 

1 

\ 

16 

f 

1 

1 

address A In 

I 

1 

J 

Job Mode. 

1 

1 

J 

Illegal instruction 

1 

A i 

1 

with function code 

1 

1 

J 

greater than or 

1 

1 

1 

equal to SO at 

1 

1 

J 

16 

1 

I 

« 

address A in 

1 

1 

1 

Job Mode. 


1 
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A7. 0 DATA FLAGS 

A 7 . 1 Soft Interrupt Bit 

Monitor software can set bit 35 of a Job’s data flag 
branch- register while the register is stored in the 
Job’s invisible package. If» after exchanging back 
to Job mode, bit 35 and its correspond ing mask bit 
(bit 19) are set, a normal data flag branch occurs 
following completion of the current instruction. 

A7.2 Free Data Flags - Bits 56 and 57 

The following are the definitions for free data 
flags 56, 57, and 5S* 

Bit 56 ~ A CPU gate associated with the maintenance 
station monitoring counters (See Section 3.6. 4. 1.1 
of Eng. Spec. 10354637). 

Bit 57 - A CPU gate associated with the maintenance 
station monitoring counters (see Section 3.6.4.1.1 
of Eng. Spec. 10354637). 

Bit 58 - Not used on COC FMP. 

A7.3 Data Flag Branch 

The automatic data flag branch can occur up to 35 
instructions after the instruction which caused it. 
The point at which the branch occurs can vary between 
executions of the same program as a result of the 
asynchronous I/O activity affecting the load/store 
operat ions. 

The following points pertain to the use of the data 
flag register* 

1 . The contents of the DFR as stored into the 
register file by a 3 B instruction will 
reflect all previous activity on it. Also, 
activity prior to the 33 instruction will 
not affect the new contents of the DFR. 

2. ADF8*s caused by a 38 instruction or any 
instruction previous to it may occur 
after the next one or two instructions, 
but no later. 


(cont inued) 
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3. Sampling or altering a data flag bit with 
a 33 Instruction may occur out of sequence 
with a previous pipeline instruction up 

to 35 instructions earl ier. 

/ 

f 

4 . If a 33 instruction alters a bit which 
causes an ADFB» the branch may occur up 
to two instructions later^ regardi ess 

of the fact that all pipeline instructions 
previous to it may have finished. 

Again, if the ADFB is also contingent on 
the completion of a pipeline instruction, 
the automatic data flag branch may occur 
UP to 35 instructions after the 
instruction which caused it. 

When registers 1, 2 or 4 in the FHP register file are 
altered by an instruction, and this instruction is 
followed by an automatic flag branch or illegal 
monitor instruction branch, the store operation may 
happen out of sequence with the branch operation. 
Thus, for example, if a 7E instruction loads register 
4 with a certain value, and this instrucion is 
followed by an illegal monitor mode instr uct I on, the 
automatic branch will be to the address specified by 
either the old or new contents of register 4, 
depending on the timing of the 7E and the instruction 
stream . 

A8.0 ADDRESS DISCONTINUITIES 

When addressing' non-existent areas of memory the FHP 
will generate an operand abort. 
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EXTERNAL 

INTERRUPT 

BIT ASSIGNMENT 



The following chart 

describ 

es the external 

interrupt 

oit assignments. 





^ Sit' * 

J 

1 

1 

1 

COC FMP 


1 

t 

1 0 

I/O 

Channe 1 

0 

1 

1 1 

1 

I/O 

Channe I 

1 

1 

I z 

i 

I/O 

Channel 

2 

1 

1 3 

i 

1 

1 

1 

I/O 

Channe 1 

3 

1 

1 

1 4 

I/O 

Channe 1 

4 

1 

1 

1 5 

1 

I/O 

Channe 1 

5 

f 

9 

I 6 

I 

I/O 

Channel 

6 

1 

1 7 

1 

4 

- 4 

I/O 

Channe 1 

7 

1 

I 3 

1 

I/O 

Channe I 

8 

» 

1 

1 9 

1 

I/O 

Channe 1 

9 

1 

i 10 

1 

I/O 

Channel 

10 

1 

t 11 
1 

1 

1 

l 

I/O 

Channel 

11 

1 

i 

1 12 

I/O 

Channe 1 

12 

1 

t 

1 13' 

1 

I/O 

Channe 1 

13 

\ 

1 14 

1 

I/O 

Channe I 

14 

1 

1 15 

i • — « 

1 

- » 

I/O 

Channe 1 

15 

4 

4 

1 16 

1 

1 

Monitor Interval 

I 

1 17 

1 

Non 

“Existent 


1 

1 i3 

1 

1 


A 


« 

9 

1 19 
1 

} 

1 


1 

1 

1 


I 

1 

1 20 

1 



t 

1 

.! 21 

I 


1 


1 

1 22 

1 


V 


1 

1 

1 23 

1 

Non 

-Existent 


1 

I 24 

1 

Non 

“Existent 


1 

t 

1 . 

1 

1 


A 

1 


1 

1 

1 

i 

1 

1 




1 

I 

I 

1 

1 

1 

1 

I 


I 

1 

I 
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1 

1 

! 

1 


V 


1 

I 39 
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Non 

-Existent 


1 
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AlO.O 04 4 64 NT BREAKPOINT - MAINTENANCE 

The breakpoint instruction transfers R to the 
breakpoint register. The breakpoint register is 
used as a maintenance and program debugging aid. 


J JUsage S I 1 

I IBits ! Breakpoint Address \ I 


0 8 9 15 16 58 59 63 

Bits 0-8 and 59-63 are not used. 

The breakpoint address is compared with various 
addresses such as the current instruction address* 
READ 1 and READ 2 operand addresses* etc. If the 
breakpoint address matches one of these addresses 
and the proper usage bit is set, bit 47 of the data 
flag branch register is set indicating a breakpoint. 
Any combination of usage bit is permissible.* 
therefore* the breakpoint address can be checked 
against any or all of the addresses listed below. 

The breakpoint register is part of the invisible 
package of a job. 

Breakpoint Usage Bits 

Bits 9-15 are breakpoint usage bits where if* 

a. Bi t 9 is set* breakpoint on half-word contents of 
the program address register (P) just after the 
execution of the instruction at that location. 

b. Bit lO is set* breakpoint on the READ 1 operand 
address for vector* or the read operand on random 
addressing instructions. 

c. Bit 11 is set, breakpoint on the READ 2 operand 
address for a stream instruction, 

d. Bit 12 is set, breakpoint on the WRITE 1 address 
. for a stream instruction or the write operand on 

a random addressing instruction. 

e. Bit 13 is set, breakpoint 
vector or operand address 
instruction. 

(cont inued) 
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f. Bit 14 is set, breakpoint on the READ 1 order 
vector address* 

g. Bit 15 is set, breakpoint on the READ 2 order 
vector address. 

Breakooint Compares 

1. When in job mode or monitor mode, addresses are 
compared with breakpoint. Since the monitor 
program does not have an Invisible package, the 
breakpoint register must be set up each time the 
monitor program is entered. The breakpoint 
register is automatically cleared to zero during 
the exchange to the monitor. 

2. Program address compares are made on half-word 
boundaries, and all other compares are made on 
sword boundaries* 


■v 
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Ail.O 06 7 NA MN FAULT TEST - MAINTENANCE 

This instruction is used to set up modes which 
modify certain logic functions in the CPU in order 
that the fault sensing circuitry may be checked out. 

The instruction is only enabled when bit 13 of word 6 
in the Job's invisible package is set (Refer to Eng. 
Spec. 10354637)5 If this bit is a zero, the 06 
instruction will act as a no op instruction. 

The instruction is always enabled during monitor 
mode « 

The modes are set up by executing this instruction 
with a •*i” in the appropriate R designator bit and 
are cleared by executing the instruction with a "0” 
in the same bit location. 

■SECDED_F AULTS 

The test is initiated by executing an 06 instruction 
with any combination of ones in bits 9 through 15 of 
the instruction (R designator field) to complement 
the respective checkword bits of ail half-words 
stored in Main Memory via the READ 3 bus. By 
appropriate selection of data bits and 
como lamentation of checkword bits when writing in 
memory, one should be able to generate SECOEO faults 
on all Read buses. This should a I low, complete . 
checking of the Read SECDEO hardware and also the 
fault recording hardware for type and address of the 
fau I t. 

The forced comp I ementing *o'f the checkword bits is 
discontinued by executing an 06 instruction with 
bits 9 through 15 of the instruction (R designator 
field) set to zero. 


(cont ihued) 
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The S and T designators are undefined. 

No interrupts or I/O memory requests can be allowed 
during the execution of these tests. 

A12.0 FLOATING-POINT SUBTRACT 

The Instruction Descriotions Soeci f ication 
ioaragraph 3.1.4»6»3) defines the floating-point 
subtract operation as "performed by complementing the 
coefficient of the subtrahend and performing a 
floating-point addition operation". It is further 
added "that the complement of an 8000 0000 0000 
coefficient is 4000 0000 0000 with one added to the 
value of the exponent associated with the 
coefficient". 

The hardware used for floating add or subtract 
operations has an extra (or extended) coefficient 
sign bit. This means that the complementation of an 
8000 coefficient is handled without the specified 
right shift of one and increase of the exponent by 
one. This will cause a result (although not 
mathematically incorrect) which may diffe r fr om the 
specified result when the following conditions are 
met ; 

1. The operand of the pair having the large exponent 

(OR either of the two operands if their exponents 
are equal) must have a coefficient of 8000 

2. This operation must require this same operand to 
be complemented due to 

a. being the subtrahend in a subtract operation 
OR 

b. sign control in either a subtract or an add 

operation 

3. The "other" operand must have a negative 
coef f icient . 
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If this ooeration is a subtract upper» the specified 
result is indefinite (with the appropriate data 
flags) while the COC FMP result did not overflow. If 
this operation were a subtract normalized* note the 
f o I I ow ingS‘ 


Result of 6F 

Subtract 

Upper 

Normal ize the 6F 
Upper Resu I t 
(3. 1.4.7) 
shifting zeros 
in from the right 


COC FM P 

CO) 7 F F F F F 

7 F F F F F 
A 


I Instruction 
i - Soeci f icat ion 
I 

I 70 3 F F F F F 

I 

I 

I 

I 

I 6F 7 F F F F E 

A 

I 
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1.0 SCOPE 

The CDC Flow Mode! Processor (FHP) Functional 

Specification is intended to provide i nf or ma_t_Lon— O-f — 

-t-h e — f-o-l-l-o w-i-ng— f-y p'es'“t'd“e'lTfTer the us er or maintenance 

personne I . 

o Information that may be obtained about 
computer/program operation via the 
Maintenance Control Unit (MCU) . 

o Changes in mode or operation internal to the 
FMP that may be made via the MCU or 
program that are not specified in the FMP 
Instruction Specification. 

o Information concerning computer operation 
that Is of value in debugging 
sof tware/haroware or in program optimization. 

This specification is not intended to provide 
information as to how a unit performs its specified 
tasks such as would normally be found in a Theory of 
poerat ion. 


2.0 APPLICABLE DOCUMENTS 

±0354636 COC Flow Model Processor Instruction 
Specification 


3.0 REPUIREMENTS 


3.± General Functional Description 

The Flow Mode! Processor (FMP) is an extremely high 
speed computational system designed specifically for 
the solution of flow simulations related to the 
design and construction of aerodynamic bodies. It is 
based. In part, on the Control Data STAR-100 
architecture, with both Main Memory' and Scalar 
Processor design taken from the STAR-IOO family. The 
resulting basic structure is augmented by a massive 


(continued) 
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Backing Storage capability (up to 256 million words 
of CCD memory) t a Swap Unit (to perform exchanges 
between the Backing Store and the Main Memory) j a . 
Map Unit (for gathering vector data from Main Memory 
and storing results), and a Vector Unit (for the 
computatinai portions of the problem solution). 
Figure 3.1-1 shows the overall block diagram of the 
FMP. 



Figure 3.1-1 Basic CDC FMP Configuration 
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3.1 


(Cont . ) 

Oata and programs are entered .into t.h^__FMP_v ia_±b.e — 

Input/Out pjj t _e ojci.ts_.a-t-t-a eh e d— t'o t ITe "Bac king Store. 

— Once‘'T^he various fragments of a Job are aggregated in 
the Backing Store? and the FMP is idle? the Job is 
’‘rolled** Into the Main Memory via the Swap Unit. 
Certain portions of the computations? all of the 
bookkeeping and all of the FMP's overall control are 
accomplished in the Scalar Processor . It is this 
processor that interprets the Instruction stream? 
acts on those instructions which it can? and 
distributes the remaining instructions to the 
appropriate attached units (Hap? Swap? Buffer? 
Vector). 

The FMP is designed to operate at a minor clock 
cycle rate of ten nanoseconds? with al 1 data 
transfers? and all pipeline segments capable of 
clocking a new data ouan.tity (32? 6.4? 128? 512? or 
2048 bits wide) every minor cycle. The maximum rate 
of ari thmet ic ■ resu 1 ts production in the 8 sector 
pipelines then becomes 3 (operations peak rate )^2 
{32~bit results per pioi el ine ) ^ 8 (pipel ines ) =48 per 
minor cycle of lO nanoseconds= 4 « 8 billion 
floating-point operations per second. 


The Scalar, Map, Swap? and Vector Units are capable 
of operating simultaneously so that a majority of 
bookkeeping and data mapping (reorganization 
functions) can be overlapped with the computation. 
This enables the effective rate of problem solution 
to approach 60 /( of the oeak rate? or 2.8 bil lion 
operations per second, which exceeds the original 
objectives established for Navier-Stokes solutions 
for flow field simulations. 

Unlike the STAR-iOO, the FMP is designed for 
monoprogramming of computational jobs? thus there is 
no virtual memory mechanism. All user jobs are 
given the use of the entire eight million words of 
Main Memory minus the first 65K words which are 
reserved for the FMP monitor. This monitor area 
cannot be accessed by the job mode programs. A 
series of several monitor mode instructions permits 
the management and allocation of’ the Backing Store, 
as wet I as control communications with the I/O 
processors attached to the FMP, These I/O 
processors? called PDCs (Programmable Device 
Contro I I ers ) ? are capable of intelligent control of 


( cont i nued) 
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the I/O trunKs (up to four attached to each PDC) and 
intelligent communications with the' monitor? as well 
as providing 200-niegabit data transfer rates between 
the Backing Store and the trunks, and from 50 to 
iOO-megabit transfer rate on the actual coax trunks 
themse Ives. 


The instruction set for scalar operations is a 
compatible subset of the STAR-iOO family which 
supoorts most STAR software, with the addition of a 
few operations made necessary by the unique I/O and 

EEPRODUCIBILm- OF c^nRsuration provided on the FMP. 

^ The Hap Unit provides execution capability of the 

STAR-1 00 "Iverson" operators of vector 
MASK, MERGE, COMPRESS and SCATTER/GATHER while the 
Vector Unit, in cooperation with the Map Unit, 
performs the "Iverson" SELECT, SEARCH, and SEARCH 
INDEXED LIST operations. The Map Unit is capable of 
performing memory to memory operations while the 
Vector Unit' is performing buffer to buffer operations 
independently. In addition, several combinations of 
Memory, Buffer, and Vector Unit operations may be 
invoked , 


The Vector Unit performs the add, subtract, multiply, 
divide operations commonly found on most processors, 
in addition to a series of linked and macro 
operations providing combinations of additions and 
multiplications every minor cycle. The set of 
linked operations chosen were based on the 
character istics of flow-model simulations that have 
been analyzed by Control Data Corporarion. In 
addition to the simple combinations of add/subtract 
and multiply, the functions SUM , PRODUC T, SUM OF 
PRODUCTS, and 'PRODUCT OF SUMS are included for 
matrix computations. 

To ensure the reliability and maintainability of the 
FMP, a number of error checking and recovery 
facilities are built in, as well as a group of 
maintenance functions which can be invoked by a 
designated computer attached to one or more of the 
I/O trunks. Single Error Correction, Double Error 
Detection (SECOED) is carried through al ! data 
trunks up to the functional unit actually using the 
data. Checking for errors is done at several points 
in' The data oath (for example at the Memory, at the 
Map Unit, and at the Vector Unit) so that faults can 
be quickly isolated, while the error correction is 
apolied at the point where the data is used, for 
example the input stream of the Vector Unit, 
(continued) 
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Figure 3.1-2 


CDC FMP Floor Layout 
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Scalar Processor 

The Scalar Processor is physically contained in a 
cabinet, attached to the Vector Processor cabinet. 

Main .Memory is attached to the Scalar Processor 
cabinet in order to reduce transfer delays and gain 
per f ormance . 

The Scalar Processor has synchronous internal logic 
with a clock period of iO nanoseconds and is 
implemented using LSI circuits. A block diagram of 
the functional components of the Scalar Processor is 
shown in Figure 3.2-1. 

The COC FMP 'instruction control is contained in the 
Scalar Processor. The Instruction Issue Unit 
consists of two parallel parts, one for the monitor 
program and one for the job program? it receives and 
decodes all instructions from Main Memory. A 
semiconductor instruction stack provides buffering 
for eight swords for the job and one sword for 
monitor (512 bits per sword) each of which can 
contain up to 128 32-bit instructions or 64 64-bit 
instructions or a mixture. The job instruction stack 
can contain up to 6 discontiguous swords with two 
swords lookahead. The Read Next Sword (RNS) portion 
of the RNS/Sranch Unit provides the control, for 
loading the instruction stack. The Bra nch~port i on 
performs branch condition testing and executes the 
branch instructions. 

The Instruction Issue Unit is pipelined and is 
capable of issuing instructions at the rate of one 
instruction every 10 nanoseconds. The Instruction 
Issue Unit decodes all instructions and directs 
decoded stream instructions to the appropriate 
processor for execution. Thus, with independent 
vector and scalar instruction controls operating on a 
single instruction stream, the Scalar Processor can 
execute scalar instructions in p.arallel with most 
stream instructions. 

The instruction stack contains eight superwords 
(swords) containing 512 bits each. If an 
instruction is referenced which is not presently in 
the stack, the Issue Unit is halted and a memory 
request is made for the word containing the required 
instruction. The sword thence brought from memory 


{ cont inued) 
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must replace one ot the swords already in the stack. 

The sword that i s ov^ a wa v ** o r_0-V.e-r-l-a-i-d--b v- — t-h-e 

rn'clfmin^ “swoFd” is the least recently used (LRU) 

sword. Thus if words numbered consecutively 0 
through 7 have been executed without any intervening 
branches^ word 8 (required by the next consecutive 
instruction) would be- brought from memory and 
overlaid in the stack in the position originally held 
by sword number 0 which* in this case* is the LRU 
sword. 


(cont inued) 
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Figure' 3»2~i Scalar Processor Slock Diagram 















ICONTROL DATA I 


I Corporation 


ENGINEERING 

SPECIFICATION 


NO. 10354637 
OATH Dec. 1977 
PAGE 11 
REV. 


R A D L 


3.2 


{Cont. ) 

Tha Load/Store Unit provides special handling of the 
Load and St ore ■ instruct ions . The unit acts as a 
pipeline and is capable of accepting a new request 
rate of one load every minor cycle or one store 
every two minor cycles-, provided a memory busy 
or register file write-bus busy 

does not occur. A circular buffer containing six 
registers provides buffering for up to six load 
requests, or three store requests, or a mixture of 
loads and stores. 

The Load/Store Unit is capable of loading a randomly 
accessed word of data from Main Memory into the 
Register File in 150 nanoseconds after reading the 
base address and item count of the data. This time 
assumes a memory busy or register file write-bus busy 
does not occur. A memory busy would add up to 40 
nanoseconds to the load time. 

The Scalar Floating-Point Unit contains completely 
independent functional elements to attain high scalar 
performance. The following are the times in 
nanoseconds to produce a 32**bit or 64-bit result in 
each functional element. These times corresoond to 
the shortstop times. Shortstop is the process by 
which a result from any arithmetic element may be 
returned directly to either input of any arithmetic 
element. This occurs in para i lei with the storing of 
the result in the Register File. Shortstop 
eliminates the time necessary to store the result in 
the Register File and then retrieve it for use in the 
next arithmetic operation. 


Add/Subtract Pipe 50 ns 

Mu I tip I y Pipe 50 

S hi f t/Logica I Pipe 40 

Single Cycle Pipe 10 

D iv ide/SQRT/Convert Element 240 


The pipe elements are segmented and capable of 
accepting new operands every ±0 nanoseconds. The 
Di V i de /SQRT/Convert element must complete each 
operation before a new one can begin. AN elements 
are capable of being shortstopped. The Scalar 
Processor contains a semiconductor Register File 
which provides 256 64-bit registers for use in 
instruction and operand addressing, indexing, field 
lengths, and as 


(cont Inued) 
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source and destination registers for scalar 

instruction op e rands and r_ejsuJ-t-5-<. T-he— Reg'i'S’t'eT^F'Fl'F 

— l*s— cap'al51"e of two reads and one write every lO 
nanose conds , 


3.2*1 Scalar Processor Error Checking 

The basic design of the FMP Scalar Processor, is based 
on the design of the STAR-iOOA' and STAR-iOOB Scalar 
Processors. In these designs (already being 
implemented) there exists a moderate amount of error 
checking on busses? 

a. SECDED - Ail data busses in and out of the 

■ Scalar Processor carry seven bits of single 
error correction, double error detection 
code bits for each 32 bits of data. The 
data buss.es are the Load/Store data bus (64 
bits wide), the Instruction Read data bus 
(128 bits wide), the Register File exchange 
path (128 bits wide each way), and the 
Register File data bus to the Hap Unit (±28 
bits wide), 

b. Parity - All microcode memories in the Issue 
and Floating-Point Units contain parity bit 
checking. The microcode carries a parity 
■bit from the time it is assembled on a 
front-end processor, until it is read during 
execution in a given unit. A parity fault 
causes an Immediate stoppage of the CPU, and 
an error flag to be sent to the Maintenance 
Control Unit (MCU) , The instruction stack 
contains parity Information in like manner. 

c. Illogical function - Communication between 
the various functional elements of the 
Scalar Processor is performed by sequences 
of microcode generated function codes, which 
are decoded at the receiving end by 
microcode. Sufficient entropy has been 
included in the function code scheme to 
permit some detection of internal control 

s i gna I fail ures , 


{ cont i nued) 
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The Scalar Processor for the FMP is being examined 
closely to see if parity (1 bit for each 8 bits of 
data) can be included in all internal data trunks and 
functional elements. Checking of the arithmetic 
elements (with the exception of the multiply eJement) 
can be accomplished by this method, thus ensuring a 
high degree of integrity for this unit. The penalty 
for this measure however can be a seriously reduced 
performance for some scalar operations (part icu 1 ar I y 
where recursion is invoked). An examination of the 
tradeoffs of cost, performance, and reliability will 
have to await more detailed design and analysis of 
the Scalar Processor, 


3«2.2 SECDED (Single Error Correction Double Error 

Detection) 

The SECDED error information is logged by the 
Maintenance Control Unit (MCU) . The Information 
logged is syndrome word, single error, double error. 
Read bus code, and CPU word address bits 37~58. 


SECDED ERROR INFORMATION 

1. SYNDROME BITS - These seven bits generated by 
the error correcting code . The 39 unique 
syndrome words for single bit errors are listed 
on Table 3.2-1. Of these 39 (odd bit) syndrome 
words, only the 32 data bit codes will toggle a 
bit when error correction is enabled. Other odd 
bit codes latched in SECDED that differ from the 
39 unique syndrome words will be flagged by the 
MCU as a multiple odd bit error. Double error 
syndrome words have an even number of bits. 

2. SINGLE ERROR - Bit 5 of channel ATSS (see section 
3. 6.1) will set if there is a single error not 

or ec ede d by a double error. 

3. DOUBLE ERROR - This MCU disoiay register will 
set unconditionally on a double error. 

4. SECDED FAULT BUS CODE - These HCU display 
registers define the read bus on which the 
SECDED error occurred. 


(continued) 
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Read 



CODE 

0 


I/O 

CODE 

i 

= 

R1 

CODE 

2 


R2 

CODE 

3 


R3 

CODE 

A 

= 

Scalar 

CODE 

5 

= 

RNI 

CODE, 

6 

= 

Swap 


The error togging priority for simultaneous SECDEO 
errors on multiple buses isJ 

1. RNI 

2. SCALAR 

3. R2 

4. R1 

5. SWAP 

6. I/O 

7. R3 

5. HALF-WORD ADDRESS (Bits 57»58) - These address 
bits decode the tour 32**bit groups within a 
quarter sword. The error logging priority for 
simultaneous SECDED errors more than one 
half-word is in order: HWOj HWl, HW2? and HW3. 

6* CPU WORD ADDRESS (Bits 37“56) ~ These address 
bits indicate the following: 

Bit 37**3B Select i of 8 Memory Chios 
Select 1 of IK Memory Ce! Is 

50 1024K Select 

51 512K Select 
52~5A Bank Select 

55~56 Quarter Sword Select 

7. LATCHED, ADDRESS BITS (37-5'83 - In SECDEO these 
address bits are always the physical CPU Word 
= Address Bits. 


(continued) 
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TABLE 3*2-1 UNIQUE SYNDROME WORDS FOR SINGLE BIT FAILURES 


8i t 

Data 


Syndrome Word 

0 

50000000 


70 

1 

40000000 


68 

2 

20000000 


58 

3 

10000000 


64 

4 

08000000 


54 

5 

04000000 


7C 

6 

02000000 


7A 

7 

OlOOOOOQ 


76 

a 

00800000 


1C 

9 

00400000 


lA 

10 

002 0 000 0. 


16 

11 

00100000 


19 

12 

00080000 


15 

13 

00040000 


IF 

lA 

00020000 


5E 

15 

00010000 


5D 

16 

00008000 


07 

17 

00004000 


46 

18 

00002000 


45 

19 

OOOOlOOO 


26 

20 

00000800 


25 

21 

OQQ00400 


67 

22 

00000200 


57 

23 

OOOOOlQO 


37 

24 

00000080 


61 

25 

00000040 


51 

26 

00000020 


31 

27 

00000010 


49 

28 

00000008 


29 

29 

00000004 


79 

30 

00000002 


75 ■ 

31 

OOOOOOOl 


60 

32 

Check Bit 

0 

40 

33 

Check Bit 

1 

20 

34 

Check Bit 

2 

10 

35 

Check Sit 

3 

08 

36 

Check Bit 

4 

04 

37 

Check Bit 

5 

02 

38 

Check Bit 

6 

01 
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The syndrome word is latched if the bit shown in the 
data pattern in Table 3.2~1 is in error. For 
example, if and only if, bit 0 failed on any data 
pattern, then the syndroms word would be 70. 

The SECOEO error latching hardware has two basic 
modes of operation - Mode l and Mode 2. 

Selection between the two modes is accomplished 
through the MCU/CPU Maintenance Line called SELECT 
.SECOED ERROR LOG MODE TWO. 

For both modes In the event of simultaneous SECOED 
errors, the information to be latched is dependent on 
the relative priority of the data buses or half-words 
which contain the errors. Ail information will be 
correct for the error selected. It is possible in 
both modes to encounter a single and double error 
simultaneously and latch the single error. The double 
error flag will set unconditionally. Therefore, 

if the double error f lag is se t . the syndrome b its 

must be checked to determine if s Inoi e or doub 1 e 

error was latched . In the event the single error flag 
is set, and no double error, the error will be a 
single error. 


The first error to occur after a master clear or 
error clear will have its error information latched. 
The information will be correct in all cases, 
regardless of subsequent errors. If a double error 
follows a single without an error clear, the double 
error information will be lost. 

Mode ? 

Operat ion in Mode z is the same as in Mode i except 
for the fol lowing enhancement! An attempt wll I be 
made to latch the error information for the first 
double error encountered whether or not a single 
error has previously been latched. 


(continued) . 
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As in Mode the double error flag will set 
uncond it iona i I y when a double error is encountered. 
However, other aspects of Mode z operation are less 
certain. The conditions which may result are’ listed 
below: 


Case 1 

In the event of simultaneous errors, Mode 2 is the 
same as Mode i» If the double error flag is set, the 
syndrome bits must be checked to determine if a 
single or double error was latched. 

Case 2 

If the SECOEO checker encounters a single or several 
single errors, and is absent of the double error 
flag, then the error information will be that of the 
first single error. AN information is correct as in 
Mode 1« 


4 '^ 




Case 3 

If the SECDED checker encounters a double followed by 
other double or single errors then the error 
information will be that of the first double error. 
All information is correct as in Mode i. However, 
the MCU cannot be distinguished from Case l with the 
doubled error latched, so the synorome bits must be 
checke d. 

Case 4 

If the SECDED checker encounters a single error and N 
minor cycles later (N<8) a doub 1‘e error is 
encountered: Address bits 37 thru 54 for either the 
single or double error may be latched; bits 55 and 56 
are indeterminate; and the remaining error 
information would be that of the double error. 


Case 5 

If the SECDED checker encounters a single error and N 
minor cycles later (N>85 a double error is 
encountered, the double error information will be 
correct. However, the MCU cannot distinguish this 
case from Case 4* 


{cont inued) 



ICQNTROL DATA I ENGINEERING NO. 10354637 

j 1 DATE Dec. 1977 

1 Corporstion i SPECIFICATION PAGE l 8 



R A 0 L 


3.2.2 (Cont.) 

Case 6 

If the SECDED checker encounters a double error and 
•one or more minor cycles later a single or double 
error is encountered^ this. is simply Cass 3 . The 
first double error Information will be latched. 

Mo d e ? A Double Error Log 

This mode is electronically identical to Mode 2* 

The difference is strictly operational. 

Speci f ica I 1 y » after a master clear or error clears 
the MCI) deliberately creates a single error using 
the maintenance function to toggle a check bit. 

This error is not cleared, and effectively blocks 
detection of all subsequent single errors. 
Consequently, when the MCU detects the double error 
flag, it knows that this is Case 5 and the error log 
information is correct for that double error. 


SLOCK WRITE ENABLES 

The MCU has the caoabi 1 ity to enable block write 
enable rf a SECDED error occurs. There are two 
options which can be selected deoending on SECDED 
error mode. 

1 . With Mode 1 , the write enables will be blocked 
when SECDED receives its first, single or double 
error. 

2* With Mode 2, the write enable will be blocked 
when SE'CDEO receives its first double error. 

COMPLEMENT I/O CHECKWORO BITS 

This maintenance feature enables the MCU to toggle 
the Write I/O checkword bits before write Into 
memory. Toggling the 128 combinations on each 
hal.f— word of the six Read Data Buses allows checkout 
of the SECDED checker. 


(continued) 



[CONTROL DATA i 

I i 

I Corporation 5 


E M G I N 
SPECIF 


E E R I N G 
I C A T I 0 N 


NO. 10354637 
DATE Dec, 1977 
PAGE 19 
REV. 


R A D L 


3,2.2 (Cent.) 

GENERAL USAGE 

Mods i is a good SECDEO latch design for a rpenory 
with low error rate. AM error log information is 
correct. However* it wilt not latch the double error 
if it follows a single error within the cycle time 
of the MCU. 

Mode 2 is a better SECOED latch design for a memory 
with a high error rate. Al 1 single errors latched 
are correct* and all doubi e -errors following a 
single error by greater than eight minor cycles 
(80 ns) are correct. A double error occurring 
before a single error is also latched correctly. 

Mode 2A is a double error logging system for use 
if single errors are to be ignored. This mode will 
miss the double error only if there is a simultaneous 
single error with higher latching priority. If this 
condition would occur, a diagnostic requesting only 
one bus wil 1 get around the bus priority. If the 
diagnostic fails and still latches a single error, 
then the double error is in a lower priority half 
word. 


3.2«3 Associative Unit 

N/A 


3.2*4 Instruction Issue/Oecode 

All instructions are read from memory by the Scalar 
Processor and decoded for subsequent issue. This is 
accomplished in the Issue Unit which is composed of 
two parts* one for monitor and one for Job. After 
decoding an instruction* the Issue Unit issues it to 
the unit responsible for Its execution: the Vector 

Unit* the Swap Unit, or the Scalar Processor itself. 
Responsibilities for all instructions are shown in 
table 3 . 2“2 » 

These units are essentially independent of one 
another and can execute instructions in oarallel. 

The remainder of section 3.2 provides additional 
information on Scalar Processor operation. Section 
3.3 describes the Vector Unit and section 3,7 covers 
the Swap Unit. 


(cont inued) 
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■ TABLE 3.2-2 INSTRUCTION RESPONSIBILITY 
} First Digit of Instruction Code 


{ 



1 0 

1 

2 

3 

4 

5 

6 

7 

*8 

9 


OIS 

S 

S 

s 

s 

S 

s 

s 

I 

I 


1 

iis 

3 

S 

s 

s 

s 

s 

s 

I 

I 


t 

2iS 

S 

I 

s 

s 

s 

s 

s 

I 

I 


I 

311 

1 

S 

I 

s 

I 

s 

s 

s 

I 

I 

Second 

1 

41S 

1 

I 

I 

s 

s 

s 

s 

s 

I 

I 

Digit 

1 

51 1 
1 

6IS 

1 

I 

I 

s 

s 

s 

s 

s 

I 

I 

o f 

I 

I 

s 

s 

X 

s 

s 

I 

I 

Instruc- 

1 

711 

I 

I 

s 

I 

I 

s 

s 

I 

I 

tion 

1 

t 










Code 

81S 

1 

I 

I 

s 

s 

s 

s 

s 

I 

I 


9IS 

1 

AtS 

I 

I 

s 

s 

s 

s 

s 

I 

I 


I 

I 

s 

I 

s 

I 

s 

I 

I 


1 

Bl I 

i 

1 

I 

S 

s 

s 

s 

s 

s 

I 

I 


1 

CII 

I 

s 

s 

3 

s 

s 

s 

I 

I 


I 

Oil 

1 

EIS 

1 

FlI 

I 

s 

s 

s 

s 

s 

X 

I 

V 


I 

s 

s 

• S 

s 

s 

s 

. I 

V 


I 

s 

s 

S 

s 

s 

s 

I 

V 


A B C D E F 
IS I I I I 
I'S I I I I 
IS I I I I 
IS I I I I 

IS l’ I I I 
IS I I I I 
IS I I I I 
II I I I I 

II I I I I 
II I I I I 

II I I I I »-®SODOC]BILlTr OP^IHH 

II I I I I page IS PO® 

II I I I I 
II Sill 
IS sill 
IS I I I I 


S - Executed within the Scalar Processor (Note that 
Data Flag information will be passed to the Data 
Flag Register in the Vector Processor for 
appropriate instructions). 

V ~ The Scalar Processor initiates the Vector 

Processor to execute portions (or all) of the 
i ns truct ions . 

I - Illegal instruction. 

X - Executed in the Swap Unit. 
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Regi star File 

The Register File of the FMP contains 256 
64 -bit words. This Register File is "capable of 
accomplishing two read operations and one write 
operation every lO nanosecond minor cycle. In 
addition, the Register File can be exchanged at the 
rate of two registers in and two out every minor 
cycle. A complete swap of the Register File is 
accomplished in 256 10-nanosecond minor cycles plus 
set-up time. 

The FMP has l& Result Address Registers (RAR) used 
to conflict check each scalar instruction ready for 
issue against register file addresses that are to be 
written by an already issued scalar instruction. If 
a conflict exists, the action taken depends on 
whether the needed result can be shortstopped or not 
(see sections 3.2 and 3.2.8 for additional 
information on shortstop). If shortstop is possible, 
the instruction is issued at the appropriate time 
and instruction issue continues. IT shortstop is 
not possible Ce.g., the result of a previous load is 
needed), issue stops. 

The RARs are set sequential Ty from the result 
register designators of issued scalar instructions. 
They are cleared when the result is written into the 
Register File. 


3.2.6 Branch/Instruction Stack 

The Branch instruction execution times may be found 
In section 3.9 of this Spec i f i cat i on . 

The instruction stack implemented in the FMP 
accommodates up to 3 swords (512 bits per sword). 6 
of which may be discontiguous. To sustain the 
instruction rate a two-sword “lookahead" will be done 
by reading the two swords following the one being 
executed. Issue will not be blocked if the swords 
following are not in the stack. 

An address is maintained for each of the eight swords 
so that out-of-stack branches may be taken without • 
voiding the entire stack. For instance, it would be 
possible to call a subroutine of up to 3 swords <48 
instructions/32 bits each) several times from a three 
sword instruction stream and never branch out-of 
stack after the first branch which loads the 
subroutine into the stack. 


(cont inued) 
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Se I f-Hodi f y ing Programs 

The material describing the restrictions concerning 
se ! f -.modi tying programs will be added at a later 
date.- This will be similar to that found in 
paragraph A2.0 of Eng. Spec. 10354636. 


3«2.7 Load/Store Unit 

The Load/Store Unit executes the’ 12j 13» 32i 5E? 5F, 
7E and 7F instructions. There are six address 
registers in the Load/Store Unit which enable 
requests to be stacked and executed in the proper 
order. The 12» 5E and 7E instructions each require 
one register and can be executed <in the absence of 
memory conflicts) at the rate of one load per minor 
cycle. The 5F and ?F instructions each require two 
address registers and can be executed at one store 
per two minor (10 ns) cycles. The 13 and 32 
instructions each reauire two address registers which 
ar.e then busy for 17 minor cycles. 

The Load/Store Unit is thus capable of streaming 
Load/Store instructions (other than the 13 and 32) 
at one. minor cycle per load and. two minor cycles per 
store assuming no Memory or Register File conflicts. 
For examole, a stream of N loads will execute in N -}• 
14 minor cycles from the issue of the first load 
until the operand from the last load available in 
the Register File. A stream of N stores will 
execute in 2N + IN minor cycles from issue of the 

I 3 

first store until issue of the last store. 


3.2.8 Scalar Floating Point 

The FMP has an arithmetic unit dedicated to scalar 
(non-vector) operations. This Scalar Floating-Point 
Unit wil 1 be divided into five separate functional 
elementst one each for add/sub tract . multiply and 
logical, a single cycle element for the add/subtract 
address and transmit type instructions , and one 
combining divide, square root and convert. 

( cont inued) 
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3.2.8 (Cont.) 

All elements of the arithmetic unit are separately 
and independently controlled to allow concurrent 
ooeration,^ However, only one operand pair is issued 
to the arithmetic unit each minor cycle so this 
becomes the limiting factor determining the result 
rate from concurrent ooeraTions. 

The first four are effectively segmented pipeline 
elements which accept a new pair of operands every 
minor cycle. They each produce a 64 or 32~bit 
result every minor cycle. The divide - sq.rt. - 
convert element is not segmented and thus accents 
operands only at completion of the previous 
operation, every 28 minor cycles per 64~bit operand. 
Using 32-bit operands would approximately double the 
result rate of the divide - sq, rt. and convert. 

Interface Between Scalar Floating Point and Scalar 
Contro I Unit 

XHDUt. Trunks 

There are three input trunks to the Scalar Floating- 
Point Unit. The characteri st i cs of these trunks are 
outlined in the following description. All input 
operands are treated as 64 or 32~bit f 1 oatTrvg-po i nt 
quantities, except as noted. If an indefinite or 
machine zero f I oating-ooint operand is received, its 
coefficient will be set to all zeros. 
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A Inpu t T runk 

This trunk is 64 bits wide. It receives 64 data 
bits from register location R in the following 
format i 

64“Bit Mode 



0 15 16 


63 

In formation 

1 exponent 1 

coe ft icient 

1 


1 1 

\ 

1 

Sit Mode 


0 7 e 15 16 

39 40 

63 

In format i on 

1 expo- 1 expo- i coe'f f i ci ent 1 zeros 

1 


1 nent I.nent 1 

1 

1 

1 


(copy 
of 00-07) 




All bits transferred on this trunk should be held on 
the trunk for a period of 10 nsec, measured at the 
input to the Scalar Floating-Point Unit. 


8 Inpu t Trunk 

The B trunk receives data from register location S 
and is Identical to the A trunk.. 


(continued ) . 
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£on t.roX T runk 

The control trunk carries t'he signals which control 
the Scalar Floating-Point Unit. It is made uo of 
the following signalsJ 


C on tro 1 Add r ess 

The control address bits are the bits that 
select the proper set of interna] control 
signals for the floating-point instruction 
being executed. There is a unique code for 
each instruction as listed in Table 3*2“3. 

Using the input data to the Floating-Point Unit 
as a references these control bits must arrive 
at the floating-point logic 1,5 cycles ahead of 
the data and be valid for lo nsec. 

M ode Con t ro I s 

the mode controls are Mode 6^ In» Mode 64 Out, 
G-bits and Divide. The Mode 64 and G-bit lines 
must lead the input data by 1.0 minor cycles and 
the Divide signal must lead by 1.5 minor cycles. 
These should remain up for 10. nsec, 

I ssue Controls 

These controls are S-Shortstop, R-Shortstop, 

S— Clockgate, R-Ctockgate, S-Shortstop Enable, 
R-Shortstop Enable and Go. These controls all 
must be valid i.o cycles ahead of the data. 

The Shortstop Enable signals enable the setting 
or clearing of the Shortstop control flio-flops. 
The Shortstop signals set or clear signals 
cause data to be clocked into the floating- 
ooint input registers when these signals are a 
one. The Go signal tells the Floating-Point 
Unit to begin processing the operands that are 
in the input registers. 
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TABLE 3.2-3 INSTRUCTION CODES 


INSTR 

M64 

M64 

CONTROL 

G-8ITS 

DIV, 

CYCLE 

BUSY 

A 

B 



IN- 

OLII 

ADDRESS 



TIME 

TIME 

TRUNK 

TRUNK OUTPUT CONTROL 

10 

1 

i 

01 


1 

2.0 

17 

0 

R 

OT,DB,OFLG39 

11 

1 

1 

02 


1- 

53 

50 

0 

R 

DT,oe 

20 

1 

1 

10 


0 


0 

R 

S 


21 

1 

i 

11 


0 


0 

R 

S 


2A 

1 

1 

18 


0 

3 

0 

I 

■ R 


28 

1 

i 

19 


0 

3 

0 

I 

R 


2C 

1 

1 

lA 


0 

3 

0 

R 

S 


20 

1 

1 

IS 


0 • 

3 

0 

R 

S 


2E 

1 

1 

IC 


0 

3 

0 

R 

S 


2F 

1 

1 

ID 

G2,G3 

0 

1 

0 

0 

T 


30 

1 

1 

lE 


0 

3 

0 

R 

S 


31 

1 

1 

IF 


0 

1 

0 

R 

+ 1 


34 

1 


20 


0 

3 

0 

R 

s 


35 

1 

u 

21 


0 

1 

0 

R 

-i 


36 

1 

i 

22 


b 

1 

0 

CIAR 

+ 20 

.3 8 

i 

i 

23 


- 0 

1 

0 

R 

T 


3C 

0 

Q 

24 


0 

5 

0 

R 

s 


30 

1 

i 

25 


0 

5 

0 

R 

s 


3E 

. 1 

1 

26 


0 

1 

0 

R 

I 


3F 

1 

1 

27 


0 

1 

0 

R 

I 


40 

0 

0 

28 


0 

5 

0 

R 

s 

OFLG42 » 43»-46 

41 

0 

0 

29 


0 

5 

0 

R 

s 

0FLG42, 43^46 

42 

0 

0 

2A 


0 

5 

0 

R 

s 

OFLG42 ♦ 43 ? 46 

44 

0 

0 

2B 


0 

5 

0 

R 

s 

0FLG42»43» 46 

45 

0 

G 

2C 


0 

5 

0 

R 

s 

DFLG42»43»46 

46 

0 

0 

20 


0 

5 

0 

R 

s 

DFLG42»43» 46 

48 

0 

0 

2E 


0 

5 

0 

R 

s 

0FLG42>43»46 

49 

0 

0 

2F 


0 

5 

0 

R 

s 

0FLG42» 43-» 46 

4B 

0 

0 

30 


0 

5 

0 

R 

s 

DFLG42» 43» 46 

4C 

0 

0 

31 


1 

.30 

25 

R 

s 

OFLGi*-! »42^43»46 

40 

0 

0 

32 


0 

1 

0 

I 

0 


4E 

0 

0 

33 


0 

1 

0 

R 

I - 


4F 

0 

0 

34 


1 

30 

25 

R 

s 

DFLG41,42, 43,46 

50 

0 

0 ■ 

35 


0 

5 

0 

0 

R 

0FLG46 
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TABLE 3.2-3 INSTRUCTION CODES (Cent.) 


INSTR 

N&4 

M64 

CONTROL 

6-BITS OIV 

. CYCLE 

BUSY 

A 

8 




IN 

OUT 

AnORFSS 


TIME 

IIN£ 

TRUNK 

TRUNK OUTPUT CONTROL 

51- 

0 

0 


36 

0 

5 

0 

0 

R 

0FLG46 


52 

0 

0 


37 

0 

5 

0 

0 

R 

DFLG46 


53 

0 

0 


38 

1 

30 

26 

0 

R 

0FLG43 »45 .i^6 


54 

0 

0 


39 

0 

5 

0 

s 

R 

0FLG42,43,46 


55 

0 

0 


3A 

0 

5 

0 

s 

R 

OFLG42 j46 


56 

1 

1 



0 

4 

0 

R 

S 



58 

0 

0 


3B 

0 

1 

0 

R 

0 



59 

0 

0 


3C 

0 

5 

0 

0 

R 

DFLG42»43i46 


5A 

0 

0 


3D 

0 

3 

0 

0 

R 



5B 

0 

0 


3E 

0 

3 

0 

R 

S 



5C 

0 

0 


3F 

0 

5 

0 

0 

R 

DFLG43 jii-6 


50 

0 

1 


40 

0 

5 

0 

0 

R 

DFLG43 ,46 


6 0 

1 

1 


41 

0 

5 

0 

R 

S 

DFLG42,43,46 


61 

1 

1 


42 

0 

5 

0 

R 

S 

0FLG42,43,i^6 


62 

1 

1 


43 

0 

5 

0 

R 

s 

DFLG 42 ,45,46 


63 

1 

1 


44 

0 

1 

0 

R 

s 



64 

i 

1 


45 

0 

5 

0 

R 

s 

DFLG42 ,43 ,46- 


65 

1 

1 


46 

0 

5 

0 

R 

s 

DFLG42,43,46 


66 

1 

1 


47 

0 

- 5 

0 

R 

s 

DFLG42,43 ,46 


67 

1 

1 


48 

0 

1 

0 

R 

s 



68 

1 

1 


49 

0 

5 

0 

R 

s 

0FLG42,43,46 


69 

1 

1 


4A 

0 

5 

0 

R 

s 

DFLG42,43,46 


63 

1 

1 


48 

0 

5 

0 

R 

s 

DFL 642 ,43 ,46 


6C 

1 

1 


4C 

1 

54 

49 

R 

s 

0FLG41 ,42 ,43, 

56 

60(1) 1 

1 


4D 

0 

4 

0 

R 

s 



60(2)1 

1 


4£ 

0 

3 

0 

T 

0 



6E 

1 

1 


4F 

0 

3 

0 

R 

s 



6F 

1 

1 


50 

1 

54 

49 

R 

s 

0FLG41 , 42 , 43 , 

46 

70 

1 

1 


51 

0 

5 

0 

0 

R 

DFLG64 


71 

1 

1 


52 

0 

5 

0 

0 

R 

DFLG&4 


72 

1 

1 


53 

0 

5 

0 

Q 

R 

0FLG64 


73 

1 

1 


54 

1 

54 

50 

0 

R 

DFLG43,45,46 


74 

1 

1 


55 

0 

5 

0 

S 

R 



75 

1 

1 


56 

0 

5 • 

0 

s 

R 



76 

1 

1 


57 

0 

5 

0 

0 

R 



77 

1 

0 


58 

0 

5 

0 

0 

R 



78 

1 

0 


59 

0 

1 

0 

R 

0 



79 

1 

1 


5A 

D 

5 

0 

0 

R 



7A 

1 

1 


5B 

0 

5 

0 

R 

0 



78 

1 

1 


5C 

0 

3 

0 

R 

s 



7C 

1 

1 


50 

0 

- 3 

0 

R 

0 



Notes 

The 

60 

instruction requires three references 

to the 


Register File 

; this 

takes two 

minor cycles 

. The 

’•(!)" is the 


first and the 

**( 2)” 

is the second. 
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TABLE 3.2-3 INSTRUCTION COOES (Cont.) 


INSTR M54 M64 CONTROL G-8ITS OIV, CYCLE BUSY A B 

1N_ OUT ADDRESS TIME I IMF IPUNK TRUN K O UTPU T CONTROL - 


80,Gl=0 

1 

1 

60 

G1,2,3»4 

Q 

3 

0 

A 

X 

90 ,Gl=i 

1 

1 

70 

G1,2,3»4 

0 

5 

0 

A 

X 

BltGi=0 

1 

1 

61 

Gl,2»3»4 

0 

3 

0 

A 

X 

Bi,Gi=i 

1 

1 

71 

G1»2,3»4 

0 

5 

0 

A 

X 

32»G1=0 

1 

1 

62 

G1»2,3,4 

0 

3 

0 

A 

X 

82 ,G 1 = 1 

1 

.1 

72 

G1»2>3?4 

0 

5 

0 

A 

X 

83^G1=0 

1 

1 

63 

G1,2,3»4 

0 

3 

0 

A 

X 

B3 » Gl = 1 

1 

1 

73 

G1,2,3»4 

0 

5 

0 

A 

X 

34iG1=0 

1 

1' 

64 

G1,2,3»4 

0 

3 

0 

A 

X 

B4,G1-=i 

1 

1 

74 

G1,2,3,4 

0 

5 

0 

A 

X 

B5 » Gl= 0 

1 

1 

65 

G1,2,3»4 

0 

3 

0 

A 

X 

85 » Gl = l 

1 

1 

75 

G1,2,3,4 

0 

5 

0 

‘ A 

X 

BE 

1 

1 

76 


'o 

1 

0 

0 

I 

BE 

1 

1 

77 


0 

.1 

0 

I 

R 

CD 

0 

0 

78 


0 

1 

0 

0 

I 

CE 

0 

0 

79 


0 

1 

0 

I 

R 
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Out out T runk 


This trunk is 84 bits wide. It transmits outout 
data to the Map and Swap Units. The data formats for 
32 and 64 -bit mode are as shown below. Data will 
remain on this trunk for iO nsec. 




Q 

15 16 

63 

64-8i t 

Mode 

1 exponent 
1 

1 coefficient 

t 

1 

1 

1 

1 



0 7 8 

■ 31 32 39 40 

63 

32-Bit 

Mode 

lexpo-lcoefficientlexpo-icoefficlentl 



Inent 1 

Inent ! 

1 


\ / 
V- 

copy of 00-31 

Outout Contro I T runk 

The output control trunk transmits control or fault 
bits associated with results generated by the Scalar 
Floating-Point Unit, These signals come uo with 
data and are held up for IQ nsec. The following 
signals are transmitted on the outout control trunks 

Si qna I 

Branch Cond, Met* 


Exit Cond. Met 


Divide Timing 
Pu 1 se 

Divide Susy 


( cont inued ) 


Meaning o f ^ "f '* on Si ana ! L i ne 

The operands meet the compare 
condition. This line is zero 
when a comoare is not being 
done. 

The ooerands do not meet the 
compare condition. This line 
is zero when a compare is not 
being done. 

Divide operands will follow 
this timing pulse by 14 cycles. 

The divide element cannot accept 
new operands curing the time 
this signal .is a "i". 
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3.2.8 (Cont.) 

SIqna I Meaning of a on Sianai Line 

Data Flags 39^ 41? See specification 10354636 for 
42 , 43 , 45, 45 these definitions. 

Instruction Conflict s 

Due to the various instruction cycle times, 
conflicts may arise at the outout of the Floating 
Point Interface and within the unit. Floating Point 
operations must not be initiated on cycles which 
will cause conflicts. The following procedure can 
be. used to determine these conflict cycles: 

C = the cycle at which operation A is 
A initiated. 

L = the number of cycles ooeration A spends in 
A f 1 oat ing point » 

C = the cycle time at which operation B is 
B ‘ initiated. 

L = the number of cycles ooeration B spends in 
B float.ing point. 

If ooeration B is initiated after operation A then 
C +L -L to avoid a conflict. 

B A A B 

In addition it must be remembered that no divide 
instruction may be initiated if 'the busy time has 
not expired from a previous divide. 


(cont inued) 
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3.2.8 (Cent.) 

Scalar Floating Point Maintenance Aid 

This feature currently under study would allow the 
display of oertinent registers in floating point 
during multi-cycle instructions. 

The Scalar Floating Point Unit is controlled by two 
separate microcodes. Each ot these mirocodes 
provides control information to the integrated 
circuit logic to implement the instruction being 
performed. By altering this control information^ 
i.e., reloading the microcode memories with specially 
modified microcodes'? the contents of interval 
registers could be transmitted? unaltered? to the 
output of the unit. Maintenance software would use 
this information tor display or fault isolation. 

Oiso I a y : 

Only the registers critical to that instruction 
would be displayed grouped in timing ranks. It may 
be possible with a multi-oipe machine to display 
comparisons. 

Additional information on this will be provided at a 
later date. 

3.2.9 Absolute Bounds Address 

The absolute bounds address mechanism provides the 
facility to notify the MCU of a memory reference 
(read or write) inside a specified block of memory. 
The block of memory is specified by an upper bounds 
sword address and a lower bounds sword address. 

Note that the addresses are absolute physical sword 
addresses transmitted from the MCU. The bounds 
addresses are defined as not included in the block of 
memory . 

The classes of reference are: 

1) Vector Read and/or Write Requests 

2) CPU and/or Swap Requests. 

Bounds checkln-g is affectively disabled? if either 
(or both) class i or 2 has neither possibility 
se I ect ed . 

' (continued) 
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3.2.9 iCont.) 

The Checker can selectively test various classes of 
requests for in-bounds conditions. Any combination 
of classes may be selected. 

If the FMP has been stopped by a bounds hit, the hit 
must be cleared by the clear fault signal from the 
MCU before the FMP can be restarted. The FMP can be 
restarted to execute the next instruction in 
sequence . 

The occurrence of a bounds hit (l.e., a selected 
memory reference inside bounds) is signaled to the 
MCU. To identify a second bounds hit, the MCU must 
clear the first bounds hit signal via the clear fault 
si gna 1 . 

When a bounds hit is made, the sword address of the 
causing request is saved in the bounds hit register 
until a Master Clear or Fault Clear occurs. 

•The bounds limits and the bounds bit address refer 
to physical addresses, which are independent of all 
Memory Degradation modes. (The bounds test is 
applied to the address after any Degradation mode 
manipulation has been applied). 
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3»2»10 Trace Register 

Register tile address zero is used , as the trace 
register. The trace register contains the address 
from which the most recent branch was taken. 

Register zero can be referenced by executing a 7D 
instruction. See the instruction specification for 
the mode of the 70 instruction which will move 
register zero to Main Memory. The maintenance 
station can read register zero by storing the 
Register File and reading absolute zero from 
memory. After a job to monitor exchange, the Jobe’s 
address zero in memory contains the address of the 
last branch taken prior to the exchange operation. 
After a monitor to job exchange, monitor’s address 
zero (absolute zero) contains the address of the last 
branch taken prior to the exchange operation. 
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3.3 Vector processor 

The Vector Processor consists of three distinct 
subsystems* the Map Unit, the Butter Unit and the 
Vector Floating-Point Ensemble (VFPE). The Map Unit 
Is a single homogeneous logical element which 
controls all memory accesses by vector ooer'atlons, 
and performs certain limited functions fsuch as 
Transmit Vector and Scatter /Gather) itself. The 
Floating-Point Ensemble is designated an ensemble 
because it consists of .a set of nine identical 
arithmetic units, all of which operate in lock-step 
synchronization. One of the units is designated by 
control signals from the Maintenance Control Unit 
(HCU) as the auxiliary or spare unit. Normally, this 
unit wil 1 be performing the same functions as the 
remaining eight units, utilizing data inputs common 
to one of the operational units, but with its output 
data Ignored. The self-checking circuits internal to 
that unit then can be .exercised continuously even 
though the unit is off-line. 

The Buffer Unit is physical ly part of* the VFPE but 
is treated as a logical entity. It has nine 
identical sections, each of which is directly 
associated with one of the nine Vector Units in the 
VFPE. 

The Vector Processor runs under its own local 
control. That is, the Instruction Issue Unit in the 
Scalar Processor passes sufficient information to 
the Vector Processor so that it can proceed 
independently, .No active control is required from 
the Issue Unit. When the Vector Processor is given 
a process to perform, it checks for resources 
required and, if available, sets up and performs the 
required operations. If the resources are not 
available, the setup information is held in a 
one-word aueue until the resources become available. 
When the Issue Unit finds the queue' fu I I , it 
suspends Issuing to the Vector Processor until the 
queue is emptied. 


{ cont i nued ) 
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3.3 (uont.) 

The Vector Processor queue Is, in fact, three 
queues - one each for the Vector, Mao, and Buffer 
Units. The Buffer and Map Unit queues are further 
broken down according to the separate resources of 
each unit. Thus, if an individual resource is 
available, it immediately tries to perform the 
desired function. 

For example, if the Map Unit is requested to get two 
vector streams from memory to be added in the VFPE 
and to store the results in the Buffer Unit, the 
Issue Unit sends information to set up t-he two read 
Duses, Ri and R2 , as part of the instruction issue. 

If R2 is in use at the time of the issue, the setup 
information for R2 is held in its queue and the Ri 
setup is performed (provided Rl was not in use). Ri 
then makes memory recuests but, because the current 
operation which has R2 in use does not require Rl 
data, no 'data moves through Ri. 

As. another example, consider a vector stream from 
memory, via Rl in the Map Unit, being -added in the 
Vector Units to a data stream from the Buffer Unit 
with the result going back to memory. Concurrent 
with this the Issue Unit can send setuo information 
to R2 and S2 in the Hao Unit and HBl in the Buffer 
Unit to cause a load of the buffer from memory. 
Because all of these resources - R2 , S2, and WBl - 
are not in use the setups are performed and the 
buffer starts loading. No conflicts occur because 
of the vector add being executed in parallel. 

Holding Its own setup information locally, a 
resource has two additional requirements in order to 
perform a function: valid data at its input and a 

place that will accept the processed output. This 
then is the control system for the Vector Processor - 
whan val id data is presented at the input to a 
function resource, if the resource has been set up 
to oerform an operation it sends an "accept" to the 
sending resource, and some number of cycles later 
produces valid output data. If the receiving 
resource is able to take the data it does so. If, 
however, proper setup of the receiving resource has 
not as yet been fully accomplished, acceptance of t.he 
data is not forced. If a given resource does not 
receive an "accept" to fhe resource supplying its 
input. Valid data is indicated by a "valid" signal 
on a single line (called the valid line) that 
accompanies the data. The "accept" signal is also a 
single line (called the accept line). 


{ cent i nued) 
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ThuSt a comolete operation consists of setting up a 
complete "valid" chain from things that can source 
"valids" (Main Memory and Buffer Unit) to things 
that can sink "valids" (also Main Memory and Buffer 
Unit), If a complete "valid” chain is established 
for a given operation, that ooeration will proceed 
to completion. In most cases completion is 
determined by the write length of the output vector 
going to zero. When this occurs a signal is sent 
backward along the "valid” chain (in the "acceot” 
direction) stopping the generation of "valid" and 
“acceot" signals. However, operations being 
performed, memory addresses being referenced, and 
the "valid** connections are maintained. Thus, only 
the changes from one operation to another need be 
sent to start the next operation, 

3.3.1 Vector Floating-Point Ensemble (VFPE) 

Figure 3,3“i provides a simplified block diagram of 
a single Vector Unit in the ensemble. Each unit is 
completely independent of another, with no 
interconnections between them for data or control . 

All incoming and outgoing control passes between each 
unit and the Map Unit or the Scalar Processor. Each 
Vector Unit contains two full multiplier and adder 
elements and two half-adder elements, each of which 
is capable of operating on pairs of 64-bit input 
operands or quartettes of 32~bit operands every clock 
cycle. Each arithmetic element (add, multiply) are 
segmented pipelines, three segments per element. 

Each segment requires one clock cycle of pipeline 
time. Thus two operands proceeding through all three 
segments for a . comb i nat ion add and multioly ( 

(A + 8)'*'C) would require nine minpr cycles to oass from 
the select network to the result busses. Arithmetic 
Result 1 (ARi) and Arithmetic Result 2 (AR2). A 
simple, normalized ADO operation utilize.s the 
front-end add elements (FADOi or FA0D2), bypasses the 
multioly elements and completes the addition and 
post-normalization in the back-end add elements 
(BAOOl or B-A0D2). The total segments for a simple, 
normal ized ADD is six or for a simole MULTIPLY 
ooeration it is also six segments of pipeline time. 
This pipeline length contributes to vector startup 
time as described in section 3.9.2 



CONTROL DATA ! 
.... .. ..... .... 1 

E 

N 

G 

I 

N 

E 

E 

R I 

N 

G 

NO, 

DATE 

10354637 
Dec, 1977 

CORPORATION i 

S P 

E 

C 

I 

F 

I 

O 

A T 

I 

0 N 

PAGE 

REV, 

37 


R A 0 L 



Figure 3»3-i One Vector Unit 
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3.3. 1.1 Read Bus Select Elements- 

There are four Input data busses for each Vector 
Unit* RBI (Read Bus ± from the Map Unit)* 

R82 (Read Bus 2 from the Map Unit), Si (Source 1 
from the Buffer Unit ), and S2 (Source 2 from the 
Buffer Unit). Each input bus is capable of 
supplying operands to any or all four of the 
functional streams (Bus A, Bus 8, Bus C Bus 0) vihich 
feed the various arithmetic elements. As can be 
seen from figure 3.3“1 then, any combination of input 
busses can be fed to any of the arithmetic elements, 
permitting such combinations to occur as (A’-AI + fB^B) 
by supplying the A stream from the Buffer Unit (for 
example) via Si and selecting it through SELECT A and 
SELECT B to the Bus A and Bus B sides of the multiply 
element. Likewise the B operands could be subolied 
from the Buffer Unit via S2 and selected through 
SELECT C and SELECT D_ to the C and D sides of the 
second multiply element (MUL2). The results of MULl 
and MUL2 would then be combined in the final back-end 
adder 8AD0 1, to form the sum of the two products. 

The read bus select elements SELECT A, B, C and 0 
are individually controlled by the C, 0, E and F 
fields of the 9F (Vector Arithmetic) instruction 
which is interpreted by the Issue Unit and 
transmitted to the Vector Unit. 

3.3«1»2 Write Bus Select Elements 

On any given clock cycle a Vector Unit can 
transmit one 64“bit or two 32~bit result operands to 
the Mao Unit for storage in memory via the WRITE 1 
Bus. On any given clock cycle the 8ADD i and BADD 2 
elements (back-end add elements) can produce one 
64~bit or two 32~bit results, each of which are 
placed on their respective Arithmetic Result busses 
(ARl, and ARE). The results appearing on these two 
busses are defined by the suboperstion codes for the 
9F (Vector Arithmetic) instruction. The N field of 
the 9F instruction (section 3.2.1.160 of the 
Instruction Specification) directly controls which 
result bus, ARi, or ARE, or no bus, is connected to 
the Write Bus, Wi« 
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3.3«1»3 Front-End Add Elements <FADOi ^ FAD02) 

Two identical arithmetic elements form the front-end 
functional orocessors of each Vector Unit. These 
elements are composed of a orenorma I ize network which 
aligns operands of unlike exponents, plus a full 
two’s complement adder producing one 64-bit or two 
32 ~bit results every minor cycle. There is no 
post-normalization shift network present in these 
elements. The output results from such an element is 
the equivalent of the FMP ADD or SUBTRACT UPPER or 
LOWER, with no normalize shifts being done on the 
result data. 

The primary function of these adders in primitive 
operations (diadic arithmetic such as A^B) is to 
pertorm the pre-normalization of input operands 
(particularly for the divisor in divide operations) 
and to provide for complementing of one or more 
ooerands for functions such as (-A'^B), 

Each FADD element has its own independent microcode 
control so that diagnostics can be loaded via the 
microcode trunk to perform failure isolation to the 
lowest replaceable component level (LSI chip). 

In addition to the p re-norma ! i zat i on of the divisor 
in d iv i de . operat i ons , the FADD elements perform the 
necessary complementation of negative source 
operands prior to oerforming the table looK-up that 
initiates the reciprocal approximations. 

3. 3. 1*4 Multiply Elements (MULl S. HUL2) 

Each Vector Unit contains two identical multiply 
elements each with its own independent control logic. 
The multiply element inputs two 64 -bit or four 
32-bit operands and produces one &t-bit or two 
32-bit results every clock cycle. This multioly 
ooeration is performed in three segments, each of 
which requires a minor cycle. In the first segment, 
four-bit groups of the multiplier are used to 
encode 8-bit groups of the multiplicand into a 
series of partial sums and carries For the 
remaining two segment times, these partial sums and 
carries are merged through a series of oartial 
adders yielding a 96~bit wide, final product of 

(continued) • 
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3 • 3 »1 . 4 (Cont . ) 

partial sums and carries which are finally added 
together in the back-end adder (BADOl or BAD02). 

This addition operation produce.s a 96~bit wide 
coefficient result which can either be normalized, 
truncated, rounded, or. left in upper or lower 
form (for double-precision arithemtic). 

Inputs to the multiplier are controlled by the 
subfunction operations specified in the 9F (Vector 
Arithmetic) Instruction, and can come from the read 
bus select networks, the front-end adders, the 
divide table element (for divide operations), or from 
one of the arithmetic result busses emerging from the 
back-end adders depending on which operations, such 
as PRODUCT, are desired. If one of the two MUL 
elements is not specified in the suboperation, then 
identical inputs are selected for both elements and 
checking is enabled, 

3, 3. 1*5 ’ Back-end Adder Elements (3ADD1 ar\d BADD2) 

Each Vector Unit contains two identical back-end 
adder units, each with its own independent control 
logic. The back-end adder consists of a rank of 
deskew logic for synchronizing the various partial 
sums and carries from the multiply elements , and a 
full t hree-inout adder capable of combining the 
multioly output results with the output of either ' 
FADOi or FA002 of the other multiplier element. This 
function provides facilities such as (A^8)+C or 
(A^B)+ (C^O) , 

Each back-end adder performs a 96-bit (in 64-bit 
operand mode) or two 48-bit (in 32-bit operand mode) 
coefficient addition every minor cycle. The first 
segment contains the latches and first addition of a 
pair of operands. The second segment contains the 
second addition of the resulting input pair of 
operands plus the final group carry/generate tree, 
and the final segment contains the 
rounding/truncation logic and post-normalization 
network. Post-normalization is control led by the type 
of operation specified in the 9F instruction. 

Incuts to back-end adders are controlled by the 
subfunction code in the 9F (Vector Arithmetic) 
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3. 3. 1.5 (Cont.3 

instruction. The outputs are placed on the 
arithemtic result busses ARl and AR2 by the adders 
3A0D1 and BAOD2 respectively. In addition, each of 
the result busses Is connected directly to the input 
selection network of the Buffer Unit. 

3.3.1.6 Divide Table Element 

The divide operation utilizes most of the arithmetic 
elements In the Vector Unit To achieve a divide 
rate of one result per minor cycle ifor . 24 ~bit 
coefficient accuracy), the reciorocal divide 
aoprox imat i on is utilized. In this mode, the divisor 
is pre-norms i i zed and its absolute value yielded by a 
front-end adder. This resulting divisor is then 
sampled by taking 12 bits of the coefficient from the 
left-most (or most significant) end, not including 
the sign (which will always be zero since the 
absolute value, of the divisor is used) , and not 
including the most significant bit {which will always 
be one- since a normalized divisor is used), yielding 
bits 18-29 of the 64~bit coefficient. These twelve 
bits are used to address a read-only memory (ROM) , or 
look-up table, called the divide table element. A 
39-bit word (plus one parity bit) is read from the 
ROM at that address. The word is partitioned into 
two fields, S (14 bits) and T (25 bits). The field 
is used as input to the other front-end adder (for 
cotno 1 ementation if the divisor was originally 
negati ve) , ■ and the S field is used as inout to a 
muitiolier to form the product of S times the 
remaining bits of the coefficient (the 34 bits not 
used in the table look-up). The multiolied result is 
subtracted from T in the back-end adder and that 
result is then fed into the other multiplier along 
with the dividend to form a &4-bit result of which 
24 bits of the coefficient are accurate. The pair of 
result.s out of the back-end adders can be stored in 
the buffers or memory, thence retrieved during a 
second pass (DIVIDE 2) to perform the necessary 
corrections to produce a full 64 -bit result. 

(contI nued) 
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3. 3 . 1 . 6 {Cent. ) 

Figure 3.3-2 shows the Interconnection for the first 
pass divide operation (DIVIDE 1) which yields a 
correct 24-blt coefficient result. Not shown is a 
network which transmits the lower 34 bits of the 
divisor with the upper 14 bits cleared to zero. It 
is this quantity which is multiplied times the slope 
(S) value to form the first product in the reciprocal 
approx imat i on. 

Figure 3*3-3 shows the interconnection scheme for the 
second pass divide operation (DIVIDE 2) which is 
• used to produce 64-bit floating-point quotient 

results. The input operands required for this second 
pass are! the first pass quotient (which is by 
itself adequate for 32-bit arithmetic), the original 
divisor,, and the intermediate product which is 
normally stored in the Suffer Unit. 

The divide table element is referenced once each 
minor cycle during the DIVIDE.! operation. This 
means that when in 32-bit mode, the divide rate is 
the same as for 64-bit mode during the first pass, 
one result oer minor cycle. Usually however, the need 
for 48-bit accuracy in the coefficient portion of the 
64-bit result will require the DIVIDE 2 pass which 
then creates a true 64-bit divide rate of one result 
every two minor cycles per Vector Unit. 
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Figure 5*3"2 First Pass for 32-Bit Divide 
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Figure 3.3-3 


Second Pass 


for 64 -Bit Divide 


















! CONTROL DATA ! ENGINEERING NO. 10354637 

I . DATE Dec. 1977 

iCORPORATION 1 SPECIFICATION PAGE 45 



R A D L 

3 . 3 . 1 . 7 Error Checking 

3. 3. 1,7.1 SECOEO 

Each of the trunks entering and leaving a Vector Unit 
and connecting to the Mao Unit contain SECDED (single ' 
error correctiont double error detection networks). 

Read Bus i (RBi) , Read Sus 2 (R82) inouts contain 
SECOEO detection and correction circuits^ while the 
Write Sus 1 trunk contains a SECOEO code generation 
network, 

SECOEO is carried on a 32-bit basis* seven bits for 
each 32 bits of data. Thus all input and output 
trunks possess 78 actual bits of transmitted data. 

3 . 3 . 1*7. 2 Parity 

The divide table element consists of a loadable RAM 
that behaves as a read only memory during normal 
Vector Unit operation. Each 39 bits of divide table 
data have a single parity bit associated with them. 

Upon each table read* the parity is checked. If a 
error occurs, the Vector Unit is immediately halted 
and the Maintenance Control Unit (MCU) is alerted by 
an error flag. In addition, the Scalar* Map, and 
Swap Units are sent stop signals. 

Upon command of the MCU the Vector Unit can transmit 
the failing memory location in the divide table, the 
ooerand location in "the input vectors for the 
failing case, and the P counters of all the control 
microcodes for the Vector Unit, to assist in 
maintenance actions. 

Each of the microcode memories contains a parity bit 
for each word addressed. In the event that 3 parity 
error occurs, the microcode sequence is frozen and 
the P counter transmitted to the MCU on command. A 
flag indicating which microcode is failing is sent 
to the MCU. The Map, Scalar, and Swan Units are 
also sent stop signals. 



{CONTROL DATA I 

I j 

{CORPORATION t 


ENGINEERING 

SPECIFICATION 


NO. 10354637 
OATE Oec. 1977 
PAGE 46 
REV. 


R A D' L 


3. 3.1.7 *3 Result Checking 

Each Vector Unit is supplied with three coincidence 
checking networks* capable of comparing the results 
produced by the identical pairs of arithmetic 
elements. CHECK 1 compares the outputs of FAOOi and 
FA002* CHECK 2 compares the outouts of MULi and MUL2* 
and CHECK 3 compares the results of the final adders 
BADOi and BAD02* Checking is enabled under the 
following circumstances, 

1. When the same input trunks are selected Into 
the pair of ooerand ports A&C and BS,0* and 
the identical functions are selected for the 
pair of elements FAOOl* FADD2 or MULI* MUL2 
or BADDl* BADD2* 

2* When a given element is idled during an 
operation. For example, the suboperation 
code 02 would invoke the operation A+B and 
C+D thus idling the multiply elements? In 
this case a pair of operands emerging from 
the front-end adders would be 
enabled into both MULi and MUL2 
automatical ly by the Vector Unit. The 
multiplied output, athough meaningless to 
the programmer, would be checked by the 
checking network. 

3. When one of a pair of elements is idled by a 
particular suboperation code. For example 
the suboperation code 05 woul d cause the 
operation {A+8)-^0, thus MUL2 wou Id be idle. 
In this case the Vector Unit would 
automatical ly enable the same pair of inputs 
to both multiply elements. The checker 
would then be enabled. 

It can be seen that the programmer can explicitly 
control checking in some cases by setting the 
appropriate fields in a 9F add instruction to select 
identical operands to identical elements. 

In the event that an enabled checker discovers a 
mismatch In the output data, the Vector Unit is 
•halted, a stop signal is sent to the Map, Swap, and 
Scalar Units, and the MCU is aierte^d. 
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3. 3. 1.8 32/54-Bit Arithmetic 

Each Vector Unit is capable of processing two 32-bit 
or one 64-bIt results each minor cycle in each of 
its arithmetic element segmentsj except for th.e 
divide table which produces one 32~bit result per 
eye 1 e . 

Each arithmetic element except for the divide table 
can also process a combination of one 64~bit and one 
32-bit operand each minor cycle as input to an" 
operation. For -example, the FAODi element could be 
accepting a &4-bit input operand on its A trunk and 
A 32-bit operand on its 8 trunk. In this mixed mode 
FADDl would produce a 64-bit result. 

Each of the input trunks, from either the Map Unit or 
Buffer Unit, provide a flag indicating what mode that 
particular trunk is operating in, either 64 or 
32-bit, The Vector Unit then automatically configures 
its arithmetic elements to accept that form of data 
on that trunk. 

The output trunks to the Mao Unit and Buffer Unit 
also provide a flag to the Vector Unit indicating 
what mode they expect their operands to be in. Thus 
the Vector Unit is responsible for the necessary 
truncation or expansion of data to match that format 
required by the receiving units. 
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3 • 3 ,1 . 8 (Con t . ) 

Floating-point numbers In- the COC FMP are two 
lengths^ 32 bits and 64 bits. The 32-blt format has 
an 8-bit exponent and a 24-blt coefficient (Figure 
3,3~4) * The 64-bit format has a 15-bit exponent and 
a 48-bit coefficient. The left-most bit of each 


exponent 

and 

coe f f i c lent 

is the sign bit. A 


deta i ! ed 

description of 

floating arithmetic Is 

presen ted 

i n 

the instruction specification. 
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3.3.1.9 Asynchronous Control 

As ai! other units, the Vector Unit controls the 
movement of operands through its various elements by 
reques t/accept signals. Therefore, as soon- as data 
ready signals appe'ar at the oorts selected by a 
particular 9F operations, the Vector Unit will begin 
to move tha data through its networks. The results 
will be placed on the selected output busses, and no 
more data will be placed there until an accept is 
received from the selected trunk destination. That 
is the purpose of the fields in the 9F which 
designate which output ports to expect accepts on 
during an arithmetic operation. 

Likewise, the Vector Unit returns an accept for every 
operand it takes from an Input port, thus allowing 
the unit supplying operands to move a fresh operand 
into place on the trunk. In the case of mixed mode 
operations where the rate of supply can exceed the 
rate that the Vector Unit can process, the acceot 
flag consists of two bits Indicating whether the 
lower or upper 32-bi.t operand has been accepted on 
the particular 6A~bit trunk. 

3.3-l.iO Control Signals 

(To be defined later) 

3 . 3 . 1.11 Microcode Terms 

(To be defined later) 

3.3.1.12 Interface Timing 


(To be defined later) 
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3.3.1.13 Exchange Operation and Interruots 

The purpose of the exchange is to change the prime 
role of'the CPU. In }ob mode, job tasks are 
performed; in monitor mode, the system decisions are 
made . 

Some instructions in progress may be interrupted 
prior to their completion. The invisible flags 
stored in the invisible package are used to restart 
the interrupted instruction exactly where its output 
left off. 

Job mode data processing can be monitored, during 
monitor mode, by examining the Stall Bit in Word 8 of 
the job's invisible package. The Stall Bit is a "l" 
if- no data was processed during the job time-slice 
that resulted in the preparation of the invisible 
package. 


Invisible Package 

The invisible package is always stored starting at an 
even numbered sword address. Therefore, the 
right-most lO bits of the starting address of the 
invisible package, must be zeros. This is as indicated 
in the Exit Force instruction write up in the 
Instruction Specification. 

The monitor must set up an invisible package for each 
job. There is NO invisible package for the monitor 
program i tse I f « 

If a job is to be re-entered, the monitor should not 
alter the job's invisible package. 

Figure 3.3-5 shows the format of the invisible 
package. 


(cont inued) 
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3.3.1.13 ICont.) 
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Figure 3»5~5 Invisible Package Format 
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3.3«1.13 (Cont.) 

The following notes apply to Figure 3.3-5. 

1 Bits 0-15 and 59-63 are not used and must be set to 
zeros. 

2 Quantity is loaded or read/stored or written by 
the Scalar Processor only. 

3 Usage bits for breahpoint register. 

4 Quantity is 1 oaded/stored by Vector Processor only. 

5 N/A 

6 Bit 16 Flag 0 
Bit 17 Flag 1 
Bit 18 Flag Z 
Bit 19 Flag 3 

Bit 20 Interrupt Flag 
Bit 21 NOT USED 

Bit 22 Load/Storel 

Bit 23 Eoad/Store2 

Bit 24 Subfunction bit 0 

Bit. 25 Subfunction bit 1 
Bit 26' Subfunction bit 2 

Bit 27 Subfunction bit 3 

7 Quantity is I oaded /stored by the Vector Processor 
only. 

8 Words 5»7»9»B»D and F are loaded by both the 
Scalar and Vector Processor. These words are 
stored by the Vector Processor if the vector 
restart bit (word 8 bit o)=i and by the Scalar 
Processor if the bit = 0. 

9 Bits 59-63 are not used and must be set to zeros. 
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3.3.1.13 CCont.) 

10 Bit 0 vector restart bit. 

The Vector Processor’s instruction register 
receives bits 0-159 word 6 and bits 1&-53 word A. 

A vector will restart without reloading the vector 
instruction from memory only if bits 16-639 word A 
are nofneeded to restart (Bit 0» Word 8=1). 

Bit 1 Register file’s scalar enable 

(Bits 0 and 1 are loaded by the Scalar 
Processor and stored by the Vector Processor). 
Bit 2-11 are not used. 

Bit 12 Stal i bit. This bit Is a ”1'* if no data is 
process e d , 

Bit 13 Fault test instruction enable. For further 
information see specification 11845800. 

Bit 14 Monitoring counters enable. For further 
Information see Section 3.7 in this 
specification. 

Bit 15 ASCII =0, EBCDIC =1 (Bits 12~15 are 

loaded/stored by the Vector Processor only). 

11 Job Interval Timer. Quantity is 1 oa ded /st or ed by 
the Vector Processor only. 

12 Quantity is stored by the Scalar Processor and 
loaded by neither. 


REPE0I)UCIBIL‘^ 
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. 3.3.1.13 (Cont. ) 

Exchange from the Monitor to a Job 

This is always accomplished with an Exit Force 
instruction. The monitor program must set up the 
invisible package for the Job prior to exchanging to 
that Job via the ExiT Force instruction. The Exit 
Force operation is as foliowsi 

1. The CPU's invisible registers and flags are 
loaded from the invisible package located 
starting at the bit address in the monitor's 
register T specified by the Exit Force 
instruction. This starting address is saved in a 
register to provide for storing the current 
invisible package when returning to the monitor 
pr ogram . 

2. The Register File for monitor is stored into 
absolute memory locations 0 through 3 FC 0 • The 

16 

Register File for the Job is loaded from the 
Job’s memory locations 10 0 0 00~10 3F CO . Any Job 

16 

mode references to this area of a Job’s memory 
causes the executing instruction to be treated as 
an illegal instruction. 

3 . The CPU mode is changed from monitor mode to Job 
mode . 

4 . The contents of P (program address register) are 
then read up and an appropriate start 
sequence Is executed. 


(conti nued ) 
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3.3.1.13 (Cent. ) 

Exchange from Job to the Monitor 

The Exit Force instruction and the channel interrupt 
are the two normal ways of getting from a Job in the 
job mode to the monitor program in Monifor Mode. 
Attempting to execute a monitor-type instruction in 
job mode or an attempt to execute an undefined 
op-code comprise the third way into the monitor. 
Except for the starting point in the monitor program, 
the operations performed in getting to the monitor 
are identical for the three. 

The operation is as follows: 

1. The current invisible registers and flags are 
stored into the invisible package starting at 
the same address used to load the invisible 
package when the Job was entered. 

2. The Register File for the job is stored in 
memory locations 10QO0O-1O3FCO and 

16 

memory locations Q through 3 FCO are 

16 

read and put into the Register File. 

3 . The CPU mode is changed from job mode to monitor 
mode. Any externa! interrupts which occur after 
this point are honored only if the CPU executes 
an Idle instruction. It the CPU does not execute 
an Idle instruction, the interrupts are saved 
until the CPU mode reverts to job mode, or until 
the monitor program clears those interrupts with 

a OE (Translate External Interrupt) instruction. 

4. The monitor program is executed starting at the 
absolute address contained in the right-most 48 
bits of the- moni tor' s register 3»5»6 or 7. 

Refer to Table 3.3~i for methods of getting from 
Job to monitor mode. 


(cont inued) 
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3.3.1.13 (Cont.) 

If an attempt" is made by the monitor program to 
perform an undefined op-code, an automatic branch is 
made to the absolute address contained in the 
monitor’s register 4, This hardware trap is to aid 
in the debugging of the monitor software and to trap 
some hardware failures. This trap is not to be 
utilized by the monitor software as a ‘'normal** 
branch . 


TABLE 3.3-1. JOB TO MONITOR METHODS 


IMethod of Getting 
•to the Monitor 

11. Undefined instruction, 
1 Monitor -type 

I instruction in Job 

I Mode, or a reference 

1 to the Register File 

I as memory (bit 

1 address 0000-3FFF .} 

1 ' - 16 


Monitor Register, the 
Contents of which is 
Used to Set P 


Register 3 


I 


12. Undefined OP Code in 
I • Monitor or reference 
1 to the Register File 
I as memory (bit 

1 address OOOO-3FFF ), 
i 16 

13. Exit F orce > 

14. Channel Interrupt. 


Register 4 


Register 5 
Register 6 


1 
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3. 3 .1 » 13 (Cont. ) 

The bits in the externa! interrupt register are 
assigned as shown in the following tablet 

TABLE 3»3-2. EXTERNAL INTERRUPT REGISTER BIT ASSIGNMENTS 


1 

1 

1 Externa 

InterruDt Line 

1 

1 Assignment 

1 

1 

1 

1 

0 

! I /O channe 1 0 


1 

« 

1 

i 1 

1 

1 

1 

2 

i 2 

! 

1 

3 

! 3 

I 

t 

1 

4 

i 4 

-• J 

i 

1 

J 

5 

} 5 

I 

1 

1 

6 

1 6 

1 

1 

1 

7 

1 7 

1 

1 

8 

1 - 8 

1 

1 

9 

! . 9 

1 

$ 

! 

12 

i ■ 10 

t 

1 

1 

1 

11 

S 11 

I 

i 

1 

• 

12 

5 12 

1 L. j 

I 

1 

13 

1 13 

1 

1 

t 

14 

! 14 

1 

1 

15 

1 1 /O channe 1 15 

f 

i 


16 

JMonitor Interval 

Timer I 
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3.3.2 Buffer Unit 

Each Vector Unit contains a buffer capable of 
holding from 1024 to 8192 64 -bit ooerands, depending 
upon machine configurations. The total 
configuration of eight buffers (plus one spare) 
constitutes the Buffer Unit. See Figure 3.3—6* The 
basic configuration of 1024 operands provides a 
capacity of 8192 total operands that can be held 
within the Vector Processor. In the maximum 
configuration this can be as high as 65*536 operands. 

f 

Although the buffers are physically contained within 
each Vector Unit to limit access times* they are 
treated as logically separate entities on a par with 
the Hap and Swap Units. Thus there is a separate 
control for addresses for reading and writing and the 
format of the operands (32 or &4-bit). 
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BUFFER 

8192 WORDS (1024 x 8) 
+ SEGDED PER 32 BITS 



TO VECTOR 
UNITS 


NOTES: 

1. SI, S2 FROM MAP UNIT 

2. AR1, AR2 FROM VECTOR UNIT 

3. ALL DATA PATHS CARRY SECDED 
BUT ONLY THE DATA BITS CARRIED 
IN A PATH ARE NUMBERED 


Figure 3«3“6 Buffer Unit 
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3*3»2«1 Read and Write 

Each buffer is capable of reading two independent 
operands every minor cycle, as well as writing two 
independent result operands per minor cycle. The 
selection of which operands to write into the buffer 
is performed by the write data selectors 1 2 (see 

figure 3.3“6). Note from the figure that any bus can 
feed any of the two select networks at the same time. 
Thus it is possible to input a data stream from Si 
(the Nap Unit) to both Write Buffer 1 (W8l), and 
Write Buffer 2 (WB2) simultaneously. 

The write addresses or WBi and W82 are Independent 
and are set up by separate Buffer Unit suboperations. 
Thus in the illustration. Si could be written into 
two separate, independent areas of the Buffer Unit, 

■s-i mu Itaneousl y. 

Read operations can proceed from independent 
addresses in the same minor cycle since the buffer is 
composed of high-speed ECL RAMS allowing random 
addressing at the rate of two per minor cycle. As 
operands are read from the buffer and placed on the 
designated trunks a "data valid" signal is placed on 
the corresponding trunk contro I lines. In addition, 
the format of the data (32 or 64 “bit) is also 
flagged on the respective trunks to the Vector Unit, 

The four ports providing input operands are connected 
to the Map Unit (Si and S2) and the output result bus 
of the Vector Unit (ARl and AR2) • 

3 . 3 »2 . 2 Contro I 

The Buffer Unit processes its own control logic, 
despite the fact that it is intimately connected 
within the host Vector Unit. The control scheme is 
based on a loadable microcode which 'hand I es the 
intsrpretat i on of the particular suboperation code, 
the setting uo of addresses, and the control of 
incrementing of address counters and testing of 
termination thresholds for all vectors whose source 
or destination is within the Buffer Unit. 
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3.3 .2 .3 


3 » 3 » 2 • A 


3. 3. 2. 5 


3 . 3 . 2 . 6 


3 • 3 . 3 


Error Checking 

AM data paths within the Suffer Unit carry single 
error correction^ double error detection codes with 
each 32 -bit operand. In the event that a read 
operation causes the discovery of a single-bit error, 
the data will be corrected and the unit will not 
halt. However, the error address at which the data 
was read will be "latched up" for sampling by the 
MCU, along with the SECDED syndrome bits which will 
be sent an error flag. 

In the. event of a double-bit error, the Vector Unit 
will be halted, a stop signal sent to he Swap, Map, 
and Scalar Units and an error flag transmitted to 
the MCU. 

Contro 1 Signa I s 

(To be defined later) 

Microcode Terms 

(To be defined .later) 

Interface Timing 

(To be defined later) 

Map Unit 

Figure 3*3-7 gives a general block diagram of the 
Mao Unit, This unit is divided into 12 functional 
elements, each of which contains its own control 
microcode, and thus is able to operate somewhat 
independently of the other elements. This feature 
is primarily Intended to facilitate fault isolation 
and maintenance. The Map Unit controls all accesses 
to Main Memory for reading and writing by the 
Vector Unit or the Map Unit itself. The Map Unit 
also contains certain vector functions which it can 
perform itself (SCATTER, GATHER, COMPRESS, MASK, 
MERGE) . 


:-yEODXJCIBE,irY OF THE 
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FROM SCALAR 
PROCESSOR 



NOTE; 

ALL DATA BUSSES 
ALSO CARRY SECDED 


Figure 3.3-7 Mao Unit 
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3. 3. 3.1 READ 1 and READ 2 

The Hap Unit contains two identical bus read 
elements termed READ 1 bus and control and READ 2 
ous and control networks. Each element provi.des The 
addressing, address incrementing and data bus 
buffering required for the interconnection to- Hain 
Memory busses, 

3. 3 . 3 . 1.1 Error Checking 

All data busses within the Map Unit provide 7 bits 
of SECDED code for every 32 bits of data. To assist 
in fault isolation, each read bus element contains a 
SECDED error checking network, for its resoective 
input port. In the event that a single-bit error is 
discovered, the contents of the respective address 
counter, an error flag, and the syndrome bits are 
held in HCU interface registers for samoiing by the 
MCU. 

In the event that a double-bit error occurs, all of 
the above actions rake olace, but in addition, the 
entireMao Unit is halted, a stop signal is sent to 
the Vector, Swap, and Scalar Units, and a fata! 
error signal is transmitted to the MCU. 

3 . 3 . 3 . 1.2 Data Movement 

Each data bus can move 512 bits (plus SECDED) of 
data every minor cycle. A] I requests to memory 
yield a full 512~bit data word, while the outputs of 
the read bus elements can emit 512, 128, 64 or 
32“bit data items. 


3 . 3 . 3 . 1.3 Address Control 

Each bus element manages its own addressing control 
of memory access. Initial addresses are sent to the 
bus elements by the map control element. In 
addition, address increment values (+ 1 , -1 or some 
prescribed variable N) are sent to each buffer 
element by the map control. An additional set of 
control lines from the mao control indicate what 
addressing modes, and what data widths are required 
of the bus control element. A special address 
increment port is supplied by the READ 3 bus 
increment, GATHER operation. 


C continued) 
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3. 3. 3. 1.3 (Cont.) 

The modes of operation for the address logic areS 

1 . Full streaming — In this mode data is moved at 
the maximum rate supplied by the memory system. 
Normally 512 bits of data are transmitted to 
the input port each minor cycle? 512 bits are 
passed directly to the Si and S2 output ports 
v<ihich suooly the Vector and Buffer Units with 
data. This mode is used for al I vector 
arithmetic operations when all operands are 
the same size. For example^ if the operation 
performed is a memory-to-memory vector 
addition with input and result operands all 64 
bits wide» the Hap Unit will provide data at 
streaming rates. However, if one of the 
operand streams is not the same size (say one 
32-bit input and one 64-bit input operand), 
then the bus supplying the 32-bit operands will 
move half as much data per minor cycle in order 
to synchronize with the 64 -bit data movement. 

2. Half streaming--In this mode memory requests 
are not made each minor cycle, but are made 
every other minor cycle. This case arises for 
the mixed 32/64-bit mode previously discussed. 

3. Word or half-word streaming — Depending on the 
operand size, the input read streams can be 
moved at regular intervals in word or half-word 
Increments. This mode is used for the COMPRESS 
and MASK operations which guarantee that new 
elements will be used evey minor cycle. 

4. Burn mode — This mode moves word or half-word 
elements, as needed, by the functional element 
in the Map Unit, Its peak rate is one data 
element every minor cycle, while It is possible 
for many minor cycles to oass before the data 
element is required. The major use for this 
mode is in the vector MERGE operation where 
data elements are moved from the read stream 
depending on the presence of one-bits in the 
corresponding order vector. 


{ cont i nued ) 
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3 . 3 .3 . i . 3 (Cont. ) 

5. Read reverse mode-~The memory system and 
addressina and data disassembly networks are 
capable ot streaming data in any of the 
previous modes in reverse order i f the address 
increment is -i. 

6. Random access mode — Ai 1 previous moo’-es dea! 
with sequentially accessed data? moving such 
data at -rates prescribed by the operation in 
process. In the case of the vector GATHER 
operation^ data is accessed non-s equent ia M y 
from the Main Memory. Two submodes are provided 
in this case*--fixed increment and variable 
index. 

In the fixed increment mode? the map control 
element provides a positive or negative integer 
value which is used as an increment for each 
new memory address. Thus? instead of the 
normal increment of -i-i, any integeral value can 
be used in this mode. In such cases for 
example? the memory addresses produced are M? 
H+N? M+2N and so forth? where M is the initial 
address and N is the fixed increment. 

In the variable index mode? the value of the 
memory address is computed from the initial 
address M and the contents of a list of integers 
I. For each integer (positive or negative) in 
the list? a memory address is computed and a 
corresponding request sent to memory at the 
rate of one per minor cycle? per bus control 
element. The memory request wiH yield either 
one 64-bit or one 32“bit operand depending on 
the format desired by the operation. 

The two bus control elements (READ i and READ 2> are 
capable of transmitting simultaneous memory requests? 
if the addresses are odd and even? respectively. 

Thus a peak rate of two random access requests is 
possible with the two elements operating in tandem, 

3.3«3«2 READ 3 Bus and Control 

A third input bus is provided in the Nap Unit for 
handling special streams of data used in the Nap 
Unit . 
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3»3*3.2.1 Error ChecKing 

SECOED error codes are carried on the READ 3 data 
bus, but are not passed on to other Map Unit 
elements as are the READ i and READ 2 data bus error 
codes. This Is due to the fact that READ 3 data is 
disassembled into bit streams or index streams and 
the 32-bit SECDED parcel is no longer Intact. Thus 
the READ 3 bus element provides error checking of 
the inout data only. 

Error handling and correction and reporting to the 
MCU are identical to that supplied by the READ i and 
READ 2 bus elements. See paragraph 3, 3*3. 1.1. 

3. 3*3. 2. 2 Data Movement 

The movement of data within and out of the READ 3 
bus element is more complex than the READ 1 and READ 
2 coun terparts , This is due to the nature of 
operands provided by the READ 3 element, 

1. Control vectoi The writing ot data into 

Main Memory on the WRITE 1 data bus can be 
controlled down to the 32“bit I ev'e K That is» 
in any given minor cycle, 512 bits of data can 
be transmitted to the Memory Interchange, but 
any combination of 32-bit operands within that 
512 bits can be suppressed (not written into 
memory). This action is controlled by the group 
of bits called control vector bits. When 
operating a vector to memory operation at full 
streaming rate, 16 32-bit quantities are 
transmitted to memory. If the control vector 
operation is invoked (see instruction 
specification for 90 instruction), a grouo of 16 
bits is provided by the READ 3 bus control 
element. Thus READ 3 must, in full streaming 
mode, be capable of disassembling the 512-bit 
input data sword (super-word) into groups of 
16-bit parcels, one parcel per’ minor cycle. 


( cont inued) 
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3. 3. 3. 2. 2 (Cont.) 

?. Order vector — The vector ooerations MASKt MERGE, 
and COMPRESS derive their control of data 
movement from a group of bits cal led order, 
vector bits {taken from Iverson’s APL) » The 
presence of. a bit in the order vector tna.y mean 
the transmission of that corresponding data 
element in the READ 2 stream fin the case of 
vector MASK operations). In this mode, bits are 
transmitted in groups of sixteen to the 
appropriate functional element {MASK. MERGE or 
COMPRESS networks) at the rate required by that 
element. , In normal operation four bits are 
moved (as are four operands) every minor cycle, 
thus requiring the READ 3 bus to provide 16 bits 
every four minor cycles. 

3. Indexed i ist--When performing the’ vector 

operations GATHER or SCATTER using a list of 
indexes, the READ 3 bus supplies these indexes 
at the rate reauired by the particular bus 
element. Indexes always fill a &4-bit operand, 
although the maximum index requires only 16 
bits. Thus indexes are moved at the rate of 
two words (64 bits wide) every clock cycle 
reauested by the READ 1 and READ 2 controls. 

Since memory requests in this mode are 
essential ly random, the rate of data movement 
is not predictable, due to memory conflicts 
between READ 1 or READ 2 or memory busy's due 
to previous requests. Thus the memory address 
and request control operates in a form of burp 
mode. 

3.3*3»2.3 Address Control 

The management of addressing and memory requests is 
much simoler in the READ 3 control element, since 
ail data is seauentiaMy accessed. Note that the 
maximum rate at which READ 3 is required to deliver 
operands (in the indexed list mode) is two 64-bit 
words each minor cycle. This means that when the 
Vector Unit is running in full streaming, READ 3 is 
making memory requests every 16 minor cycles, and 
when not in full streaming (SC.ATTER/GA THER) . READ 3 
is requesting memory evey a minor cycles. This 
tower rate for READ 3 makes it possible to share the 
residual memory bandwidth with the Swao Unit and 
instruction stream requests Issued by the Swap and 
Sea ! ar Uni ts . 
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3»3.3*3 WRITE 1 Bus and Control 

The WRITE 1 Bus (Wl) provides the output port tor 
the Vector and Map Units back to Main Memory. 

3.3.3»3.1 Error Checking 

The WRITE i bus provides 7 bits of SECDEO code for 
every 32 bits, of data transmitted. SECOcD codes are 
generated by each of the functional components of 
the Vector and Map Units, so that WRITE 1 control 
only checks for errors in the operands being 
transmitted through it. This feature is provided 
for fault isolation and maintenance procedures. 

Error checking, correction and" MCU communications 
are the same as for READ i, READ 2 and READ 3. 

3. 3. 3. 3. 2 Data Movement 

The WRITE 1 bus is capable of transmitting 512 bits 
plus SECDEO each minor cycle when in streaming mode. 

In addition to the 512 bits, a group of sixteen bits 
called write enables are transmitted to enable the 
storage, or suppression, of any 32-bit quantity 
transmited to the memory system. These write 
enables permit the storage of vectors beginning at 
memory address other’ than 512 -bit boundaries, and 
the ending of vectors on other than 5l2“bit 
boundaries. 

The write enables are also used to transmit control 
vector bits for selec,tive storage suporession when 
invoked by that particular suboperation for the Mao 
Unit. 

3»3.3.3.3 Address Control 

WRITE’ 1 is capable of writing, sequential data at 
streaming rates, or writing data at fixed increments 
(as READ 1 and READ 2 can read data at fixed 
increments), or writing data based on a list of 
indexes provided by the READ 3 trunk (SCATTER 
ooerat ion ) . 
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3«3*3*4 GATHER Assembly Network 

When performing the GATHER operationir the individual 
operands (either two 64 or two 32“bit operands) 
which are delivered each minor cycle by fhe READ .1 
and READ 2 busses must be assembled into a 
contiguous vector. This network provides that 
function. In addition to handling the maximum rate 
of operand input, the network must also be able to 
handle burp rates as memory and bus conflicts 
interrupt the smooth flow of data. 

AM data gathered includes its corresponding SECOED 
codes (which is based on 32~bit parcels), but no 
error checking or correcting logic is included. 

3. 3.3 .5 S± and S2 Multiplexors 

The Map Unit provides two data ports supplying the 
Vector and Buffer Units. These are the Si (Source 1) 
and S2 (Source 2) ports appearing in Figure 3.3~7. 
These multioiexors are provided to permit the 
selection of. one 'Of the Mao Unit functional elements 
(GATHER, COMPRESS, MASK/MERGE) or the contents of 
READ 1 bus or READ 2 bus as inputs to the source 
• stream going to the Vector or Buffer Units, Both 
busses can move data at 'the maximum streaming rate, 
or at slower rates depending on the specified 
function. For every operand segment transmitted, the 
Si and S2 busses must receive an accept signal on 
their respective control lines before moving a new 
data quantity onto the bus. 

3. 3. 3. 6 WRITE 1 Select 

This network provides a simple selection multiplexor 
for the stream to be written back to memory. No 
error checking logic is included although the busses 
carry the SECDED codes. 

3. 3. 3. 7 MERGE/MASK Network 

The vector functions MERGE and MASK are provided by 
this element. Inputs are data streams from READ 1 
and READ 2 at 128 bits aoiece. A disassembly 
register is provided for each stream to break down 
the data into 64 or 32~bit operands. 


( cont inued ) 
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3. 3. 3. 7 (Cont.) 

Data is moved at the rate governed by the input 
order vector (sixteen bits at a time) and the 
specified functionJ 

1. HERGE--Ihe MERGE operation has two modes, 

replace and shuffle. In the replace mode, the 
vector input on READ 1 is moved through the 
element at the rate of four oosrands per minor 
cycle. The order vector is moved at the rate of 
four bits per minor cycle. When a one-bit is 
found in the order vector a data element from 
READ 2 is replaced for the correspond ing data 
element in the READ 1 vector, and the next READ 
2 data element is moved uo to await insertion. 
Thus READ 2 is moved at the rate of one-bits 
appearing in the order vector. 

In the shuffle mode, READ 1 is advanced for 
every one-bit in the order vector and READ 2 is 
advanced for every zero-bit in the order vector. 
When a -stream is advanced, one operand from that 
stream is moved into the output stream. In both 
cases the output stream is moved at the rate of 
four operands per minor cycle. 

2» MASK-~In the MASK operation all streams, input 
and output, are moved at the rate of four 
operands per minor cycle. In this case, if an 
operand is not used from the input streams it is 
thrown away. 

SECDED error codes are carried through this element, 
but no checking or correcting is don'e there, 

3-3. 3. 8 COMPRESS Network 

The COMPRESS network operates in a similar fashion 
to the MASK/MERGE network, however. It utilizes only 
one input stream, READ i, and produces output data 
at the rate at which one-bits appear in the order 
vector. The READ 1 stream is moved at the rate of 
four operands per minor cycle, the order* vector is 
moved at the rate of four bits oer minor cycle. In 
the event that the order vector was completely 
filled with one-bits, the output stream would move 
at the rate of four operands per. minor cycle. On 
the other hand, it the order vector was completely 
vacuous, no output would be oroduced, but the entire 
READ 1 vector would be input and thrown away. 
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3. 3.3*9 Order Vector Assembly 

One sat oT vector functions permit the creation of 
the order or control vector based on arithmetic . 
comoarison-s performed in the Vector Units. In this 
mode each Vector Unit transmits 3 control bits 
indicating the state of comparison of a set o-f 
operands in a given clock cycle. This comoarison 
can only be performed by one of the two back-end 
(final) adders (8ADD1 or 8A002) in the Vector Units. 
The condition codes transmitted are? A=3» A>8j A<B. 

The condition codes are then selected by the Map 
Unit for any combination (A<=B) of conditions and 
used to form a bit vector indicating the truth or 
falsity of the selected condition. The resulting 
order/ contro 1 vector is then stored into memory 
via the WRITE 1 bus. 

Since the order/contro I vector is generated at this 
point} the assembly network also generates the 
SECOED codes needed for all operands. Order/contro I 
vectors are-formed at the rate of 8 bits per minor 
cycle (8 pioeline units at one condition each per 
minor eye I e ) , 

3.3.3.10 Map Control 

The map control element provides the interface 
between the Instruction Issue Unit and the Map Unit. 
When a map instruction (od code 9D) is detected by 
the Issue Unit, it is transmitted (along with all 
suboDeration parcels) to the mao control element 
which then makes any necessary 
register file references, forms the starting 
addresses, increments, and control signals and 
transmits them to the appropriate Map Unit elements. 
The Map Unit is responsible for releasing the Issue 
Unit to go on issuing further instruct ions . 

Many conditions arise during vector and mao 
operations which can effect the contents of the data 
flag register. The map control element deskews all 
such data flag information and makes the approoriate- 
changes in the data flag register. 
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3.3»3»li Pipeline Selection 

Figure 3*3-8 gives a block diagram overview of the 
interconnection of the Map Unit and the Vector Units. 
In this diagram* the Buffer Unit is not shown* but 
instead is considered imbedded within the respective 
Vector Units. 

As can be seen from the figure* there are actually 
nine physically distinct Vector Units comprising the 
Floating-Point Ensemble, Nine input select and 
eight outut select networks are housed within the 
Map Unit to provide 'connections to the Vector Units 
and the input and output data busses. Only one of 
the input trunks is shown here* labeled S(0) through 
S<7)» corresponding to the source trunk Si of 
emerging from the Map Unit, Also shown is a special 
data trunk labeled maintenance data* which can be 
selected into any or all of the nine physical Vector 
Units. The selection of maintenance data In and' 
maintenance data out is under the control of the 
Maintenance Control Unit (MCU). 
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Figure 3.3~8 Kao/Vector Interconnection 
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3*3. 3*11*1 Normal Operation of Selection Networks 

Upon deadstart of the FMP» the Maintenance Control 
Unit (MCU) sets up the input data selection networks 
and output data selection networks for eight 
pipelines. Normally the pipelines would be 
configured with Vector 0 through Vector 7 on-line to 
the input and output data trunks. In addition^ the 
data trunk of the adjacent Vector Unit fin this case 
Sf7)) would be enabled to the extra Vector Unit (in 
this case Vector 8). The output of Vector 8 would 
not be selected into WRITE 1 (Wi), but could be 
sampled by the Maintenance Control Unit, The same 
selection would be made for the S2 (Source 2) bus 
from the Map Unit, Thus during execution of vector 
arithmetic instructions-, Vector a (in this example) 
would be performing identical operations on data 
identical to that submitted to Vector 7* Thus the 
internal arithmetic elements and checking circuitry 
of the excess unit are continuously exercised. 

In the event that the excess unit discovers an error 
in its own operation (checker failure, parity error 
or SEC DEO double error), The*Vector Unit will ba 
halted but no stop flag will be sent to any other 
Units. The Ma'intenance Control Unit (MCU) will be 
alerted, however. Under control of the MCU, soecial 
data trunks can be connected to the inout— and output 
of the excess unit and fault isolation diagnostics 
executed with selected data being forced into the Si 
and S2 ports of the failing unit. This technique 
permits the on-line maintenance of a failing Vector 
Unit. 

3*3.3*il*2 Error Recovery and Maintenance 

In the event that an error is detected in one of the 
on-line Vector Units, the entire FMP is halted and 
the Job in progress is aborted. Before another job 
is started the MCU will switch the data bus selects 
so that the excess unit is introduced into the 
system, and "the failing unit removed. For example, 
if Vector 4 were to fall and thus be s.witched 
off-line, the input selects would be changed so that 
5(4) would now go to Vector 5, S(5) to Vector 6 , 
and so on with S( 7 ) now gated to the previously 
offline Vector 8 . At the same time the output 
selects would be changed in similar manner, as well 
as maintenance communications enabled with Vector 4 
through the data busses. 


(continued) 
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3.3.3.11.2 (Coni.) 

This scheme permits the use of any Vector Unit as 
the excess unit? dependng on the HCU controls set up » 
thus all pipelines can be continuously exercised in 
an on-line manner thoughout the operating day. In 
such instances, the Maintenance Control Unit could 
rotate the assignments between jobs. 
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3.4 Main Memory 

Main Memory Is a s ing I e- 1 eve I ♦ random- access 
memory using bipoiarS 4 K-bit integrated circuits. The 
memory words are 78 bits which provide for a 64~bit 
data word and 7 bits of single error correction 
double error detection (SECOED) tor each 32 -bit 
half-word. The semiconductor memory access time is 
40 nanosecondSi where access time is defined as the 
■ time from the address reaching memory until data is 
clocked out of the memory. This memory is directly 
addressable in either monitor mode or Job mode. 

The basic Main Memory size is two million words 
with expansions to four or eight million words 
available as field upgrade options. 

Each two million words of Main Memory contains 16 
memory stacks each having 256K 39-bit half-words (32 
• data bits plus 7 SECDEO bits). Each 256K stack is 
arranged in eight phased -banks. In streaming mode-, a- 
reference wil 1 be made simultaneously t.o the same 
address in each of the 16 memory stacks to obtain a 
supei — word (SWORD) of 512 data bits. Mem'ory busy 
conflict rules take into account the 15 ohysically 
independent stacks and the eight-bank phasing within 
each stack to treat the bank address in each of the 
16 stacks as a separate entity. Thus, it could be 
said that each two million words of Main Memory 
contains 128 phased half-word banks. 

The eight-bank phasing plus the Physical distribution 
of the memory stacks al lows memory references to be 
made at a maximum rate of one every iQ nanosecond 
minor cycle for each two million words of memory. 
Thus, the Main Memory has very high data transfer 
bandwi dt hs S 

two million words = 512 bits/minor cycle 

four million words = 10 E 4 bits/minor cycle 

eight million words = 2 O 48 bits/minor cycle 
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3.4.1 Memory Stack 

The memory stack is packed in a freon-cooled .5 
cubic ft. area with 8 banks, each 32K x 40 bits. The 
FMP utilizes thirty-two bits for data and seven 
bits for SECOED. There are three board tyoes used in 
the stackt inout control, storage, and output. 

Figure 3«4-l shows the module organization which 
lends itself to massive use of distributed loading 
and em i t ter-ANOi ng and also results in “zero-skew” 
construction which equalizes signal paths through all 
memory chips to maintain identical timing throughout 
the stack. 


(continued) 
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3.4.1 (Cent.) 



BANK CONTROL AND TIMING 




A INPUT BANK 0.2. 4. 6 


W. DATA 0-19 ADD- 




8 INPUT 


BANK 1 ,3,5,7 


W. DATA 20- 39 ADD- 




C STORAGE BANK 0 


BIT 0-19 




0 STORAGE BANK I 


BIT 0-19 




e STORAGE BANK 2 


BIT 0-19 


F STORAGE BANK 5 BIT 0-19 F 


G STORAGE BANK 4 BIT 0- 19 G 

H STORAGE BANK 5 BIT 0-19 H 


J STORAGE BAN K 6 BIT 0- 19 J 


K STORAGE BANK 7 


-y//)^p//////z>'p//p7//////////p 


BIT U-19 


L STORAGE BANK 0 




BIT 20-39 




M STORAGE BANK I 




BIT 20-39 


N STORAGE BANK 2 


WP////PP//P/P//P//////////////////P^ 




BIT 20-39 




P STORAGE BANK 3 


BIT 20-59 


w///////y/jpy///pz^77P7/p///p////p/p///////^^^^^^ 


Q STORAGE BANK 4 




BIT 20-39 


v//m ?/ ////////yyjyyy/////y///p/p//////////p////^^^^^ 


R STORAGE- BANK 5 


BIT 20-39 


Y// ^ ^y////y/// //////////////////yyyy^.^///y^^^^ 


S STORAGE BANK 6 


BIT 20-39 


-^^/ypy^/yy/y///py///////////p/y/y//^^^^ 


T STORAGE BANK 7 


BIT 20-39 


U OUTPUT TERMINATORS. READ DATA REGISTERS 



Figure 3.4~1 Memory Stack 
(continued) 
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3,A«i (Cent. ) 

108 coax tines connect each memory stack to the 
Memory Interchange, Atl signals on the lines except 
the read data are sent from the interchange to the 
stack. Below is a list of these lines: 

Clock (2) - One for each input board to 
synchronize the memory stack to the interchange. 

Absolute Address ( 15 ) - Twelve address bits for 
the selection of the AK memory Chios and three 
address bits for the selection of the eight 
ranks of memory chins. 

Bank Address ( 6 > - Three for each input board 
which are decoded for the selection of the 
eight banks within a stack. 

Stack Request (2) - One for each Input board 
which are decoded for selection of a unique 
memory stack. 

Write Control (2) ~ One for each input board to 
Inform the stack of a write memory cycle. 

Write Data ( 3 D) - 39 data bits to memory, 32 for 
data, 7 for SECDED, Bits 0-19 on the "A" 
input board, and bits 20-38 on the "B“ input 
board. 

Sync (1) - This signal provides a point of time 
reference for maintenance Purposes, 

Master Clear (2) One for each input board. 

Read Data (39) - 39 Read data bits from the 
read data registers on the output board back to 
the interchange, 

3«A,2 Memory Configuration 

Memory stacks are located as shown in Figures 3 , 4-2 
and 3 , 4 - 3 , There are eight stacks per memory section 
and these sections are positioned around the Memory 
Interchange, (Figure 3*4-2), Figure 3*4-3 shows 
where the various half-word segments in a sword of 
data reside in memory and where stacks reside in a 
section. Section positioning in this diagram is shown 
for data clarity and is not the true physical 
positi oning. 
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(to be supplied later) 


Figure 3»4~2 Section Designations FHP Memory 
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Memory Interchange 

The Main Memory has 512 banks, each bank 39 bits wide 
(32 + 7 SECDEO) by 32K words deep. See figure 3 , 4 - 4 . 
The me mory • has two separate access control networks, 
each network connected to one-half (256 banks) of 
memory. The "EVEN” control network addresses and 
passes data to and from the even numbered l 024 -bit 
groups and the *'0DD" control network the odd numbered 
groups. This is done to enable two separate accedes 
to memory simultaneously. 

The other side of the Memory Interchange are the 
connections to the Map Unit and the other units that 
require access to memory - the Scalar Processor and 
the Map Unit. The memory access is controlled 
through four memory ports. Three of the oorts are 
dedicated connections to the Map Un.it (Rl, R2i Wl) 
and the remaining port is time shared between the 
remaining read/write buses (R3 and swap). 

For vector operations Ri, R2 and R3 make requests 
* for 2048 bits of data thus requiring both even and 
odd control networks. For Rl and R2 , 2048 bits of 
data are held and sent 512 bits at a time to the Map 
Unit, Wl accepts 512 bits and assembles them into 
2 O 48 bit groups. This assemb I y/disassemb I y means 
that a port requires access to memory one cycle in 
every four. Thus the map ports can use up to 3/4 of 
the memory bandwidth. 

The remaining port into memory is divided 3 ways* 
the Swap Unit, R 3 to the Hap Unit, and R3 to the 
Scalar Processor. The Swap Unit will use 128 data 
bits per cycle and will always address memory 
sequentially. On a swap request the port will get or 
store 1024 bits from/to memory. Thus the Swap -Unit 
can make a memory request every eight cycles and data 
will, move to/from the Swap Unit as a 512-bit move 
every four cycles. The Mao Unit uses R3 in either of 
two modes - to get indexes for the GATHER/SCATTER 
instructions or to get bit strings (order vectors). 

In the first case the Map Unit can use uo to 128 bits- 
on a cycle, thus requiring a memory request every 
eight cycles. In the second case the Map Unit uses 
up to eight bits on a cycle thus requiring. a memory 
request only every 128 cycles since the oort will 
always request 1024 bits of data. The R3 connection 


.( cont inued) 
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3.4.3 {Cont.) 

to/from the Scalar Processor is used for two 
purposes. The first is to fetch instructions into 
the Issue Unit and the second is for load and store 
requests to/from the scalar Register File. Both of 
these functions are highly asynchronous. The Issue 
Unit has a buffer to hold uo to 4096 bits of 
instruction so as to enable orogram Ioods of 
reasonable size without making repeated memory 
requests . 

Memory can be accessed in several bit- group sizes - 
32 bits (half-word), 64 bits (i word) 1024 bits (16 
words), and 2048 bits (32 words). Each port contains 
logic to tell memory the access width and the address 
of the first half-word to access. All accesses must 
be on proper boundaries for the size requested. 

Thus, for example, a word access must request an even 
half-word address. Even though an entire 1024 bits 
of data may be transferred by a network control, only 
the data requested wil 1 make the memory busy. Thus a 
request for a full word of datawill make two banks 
of memory busy thereby reducing possible memory 
conf 1 icts. 
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Figure 3.4-4 FMP Memory Interchange 
and Main Memory 
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MEMORY 

INTERFACE 

BUFFER 


} MEMORY PORT 


TRANSFER 

MODE 


Ha I f- word , Word» { 
Sword { 


Half-word, Word, 
Sword 
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3 . 4.4 Memory Degradation 

If more than the minimum two-M'word memory is 
oresent? degradation may be selected so that the 
amount of usable memory is less than the total 
memory on the system. The amount of usable memory 
is control led by three degradation bits from the MCI) 
along with a strobe bit. 
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See Figures 
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3.5 I/O Channels 

The FMP is equipped with a basic set of eight I/O 
channels, and includes space for an optional eight 
I/O channels. Al I input and output 'is through the 
BacKing Store, via the Swap Unit. Figure 3.5-1 
gives a general block diagram of the I/O unit. The 
input/output characteristics are optimized around 
large block transfers of data, ""since small block data 
handling appears to have a higher overhead associated 
with memory accesses. 
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(T) EACH PDC CAN HAVE UP TO 
4 TRUNKS ATTACHED. 

d) A TRUNK WITH THE ASSOCIATED 
DATA SET AND PDC CONSTITUTES 
A CHANNEL. 


Figure 3*5~1 170 Unit 
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3.5.1 Data Movement 

Data can be transmitted from the I/O Unit at the 
rate of 128 bits every clock cycle. Thus with a lO 
nanosecond clock cycle the total I/O bandwidth Is 
12.8 billion bits per second. However, the I/O Unit 
must share the Swao Unit bandwidth with Memory to 
Backing Store swaps, and thus the design goal is tor 
an achieveable, sustained I/O bandwidth of 3.2 
billion bits per second. 

The I/O Unit is engineered in two parts, CPU and 
peripheral. The CPU portion includes the I/O 
distributer for control between the Scalar Processor 
and PDCs, the I/O buffers and associated channel 
control not shown. The individual CPU-end capability 
for tranfers is one 32~bit (Plus SECDED) data item 
transfered per minor cycle per channel . The current 
peri ph era I -and bandwidth is 50 megabits per second, 
thus the CPU-end is not being challenged in the 
initial installation. 

Data is moved between the I/O Unit and the Backing 
Store in 128-bit parcels {quart er-swor ds ) . Data is 
moved between the I/O Unit and the peripheral 
sections in 32“bit segments, and control information 
under the command of the monitor mode scalar 
instruction is communicated between the Scalar 
Processor and the peripheral section in 54~bit 
pieces . 


3 . 5.2 Error Checking 

AM data passed through or buffered in the I/O Unit 
contains seven bits of SECDED information tor every 
32 -bits of data. SECDED checking and generation are 
performed in the POC {peripheral device controller) 
so that error correction can cover the total path 
from the Swap Unit out to the peripheral sections. 

In the event that a single-bit error is detected by 
the POC, the error is corrected and the PDC memory 
counter for the failing address plus the syndrome 
bits are locked up for sampling by the MCU, An 
error flag is sent to the MCU indicating which POC 
discovered the error. 


{ cont i nued ) 
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3.5.2 (Cont, ) 

If a double-bit ( uncorrectab i e error) Is discovered, 
the error data is latched up, the MCU is flagged, 
and the POC attempts to retry the data transfer 
between the Backing Store and the POC, If after a 
number -of attempts (set by the installation) the 
error cannot be recovered, an error message is sent 
down the netvjork data trunk by the PDC, a code word 
is sent to the scalar monitor on the 64~bit trunk, 
and the I/O Unit idles that oarticuiar channel. If 
the MCU discovers more than one channel failing it 
will force a Job abort of the computation in 
progress and cause the monitor to enter a channel 
diagnostic mode. Al I stations attached to the trunk 
are alerted to the problem and will take appropriate 
action, including switching to an alternate channel. 

Data transferred onto the data trunk carries one or 
more CRC (cyclic redundancy codes) for error 
checking. If the POC receives a faulty 
transmission, it requests a retry of the full block' 
on the trunk for a number of times. If the block 
cannot be transmitted the PDC nil. I signal the 
transmitting station and attempt to retry on an 
alternate trunk (each POC can be attached to up to 
four trunks at a time). The MCU and the 
transmitting stations are all alerted to any errors, 
whether transient or fatal, and through software 
will take appropriate actions. 

3.5.3 Addressing 

Al i' data transfers to and from the I/O Unit are by 
128-bit parcels? however, the minimum block size 
transmitted from the Backing Store is defined for 
each channel, but cannot be less than 512 words. 

Ail data arriving at the I/O Unit is held in a large 
bufter. This buffer can house from 32,768 to 
262, 1A4 words depending on the bandwidth requirements 
of the perioheral subsystem on a given channel. 

Eight channels operating at full rate would require a 
full block to be held at a time per channel to 
minimize interference with other Backing Store 
requests. All data enters the homogeneous memory 
puffer,' which is allocated by a local control in the 
I/O Unit. Since a variable amount of buffer can be 
allocated dynamically, the buffer can be of modest 
size, with various I/O channels given large portions 
as they need them for large transfers. 
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3.5.4 The PDC 

See report Appendix C 

3«5«5 The Trunk 

See report Appendix D 

3.6 Haintenance Control Unit (MCU) 

The MCU is an autonomous Maintenance Control Unit 
connected to the computer via a CDC FMP I/O channel 
with access to special internal interfaces. These 
interfaces allow it to regulate information flow» 
control pulsest and monitor performances of the 
computer. The MCU consists of a control unit, line 
printer, card reader, and a disc drive and it 
provides for system dead-start and system performance 
monitoring. Special connections to the computer give 
the MCU the capability of monitoring system 
performance. Diagnostics and preventive maintenance 
are facilitated by this section. 

There are three operating modes for the MCU. 

1. The first mode is under operation of a diagnostic 
maintenance program to locate faults and 
malfunctions within the MCU, 

2. The second mode of operation is running diagnostic 
routines on the FMP. The MCU loads diagnostics, 
ranging from a simple command test to a very 
sophisticated diagnostic catalog routine, controls 
and monitors the operations of the diagnostics, 
and displays the results of the tests via the 
display unit or line printer. 

3. The third mode of operation is System Operation. 
Here again, the MCU loads the Operating System 
Software into the FMP and controls and monitors 
its operation. In this on-line mode of operation, 
the MCU concerns itself withf autoloading the 
central processor and first level stations, 
running on-line diagnostics, monitoring CPU 
faults, and restarting the central processor after 
han g-ups . 
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3.6»i MCU/ CP Dint er face 

The MCU connects to the FMP via a network trunk. 

It interfaces 16 buffersi interna! to the FMP» 
called MCU/CPU channels - ATS being outgoing buffers 
and ST A being access channels. Tables 3.6~1 through 
3.6-8 show the channels from the CPU to the MCU (ATB) 
and Tables 3»6-9 through 3.6-16 show the channels 
from the MCU to the CPU (STA). Each table shows the 
channel bit number? and function of each bit for a 
channe 1 . 
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TABLE 3.6-1 CHANNEL ATBi 


IBit No. 

t 

Function 1 

! D 

! Bit 0 

Current Instruction Address * 

i 1 

I 1 

Register ' 

5 2 

5 2 

» 

i 

1 

1 

1 

J 4 

i 4 

I 

1 5 

J 5 

i 

1 6 

\ 6 

1 

1 

1 2 

J 2_ 

{ 

1 8 

1 8 

i 

1 9 

\ 9 

1 

} A 

I 10 

1 

\ fi 

1 _il_ 

! 

1 C 

1 12 

J 

! 0 

i 13 

1 

1 E 

i 14 

1 

1 F 
1 

} 15 

1 

1 
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TABLE 3.&-E CHANNEL AT82 
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I 22 

1 

J 

Z 


1 

1 

8 

1 24 


J 

9 

! 25 

3 

t 

A 

J 26 



R 

1 _2Z 

t 

i 


C 

1 28 

t 

« 

1 

0 

\ 29 

1 

1 

E 

1 30 

1 

{ 

1 

F 

1 31 

1 

1 
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3.&.1.1 (Coni’.) 


TABLE 3.6-3 CHANNEL ATB3 


IBit No. 

i 

t 

Function 

< 

1 

\ 0 

\ Bit32 

Current Instruction Address 

i 

) 1 

1 35 

Register 

1 

! 2 

i 34 


) 

1 

I 

i iS 


1 

1 4 

1 36 


I 

i 5 

1 57 


t 

f 

i 6 

1 38 


[ 

i 7 

J 52- 


] 

,1 

i 5 

1 40 


! 

1 9 

1 41 


i 

5 A 

1 42 


I 

f 

1 B 

J 4^_ 


1 

1 

\ C 

1 44 


1 

4 

J 0 

1 45 


1 

{ E 

5 46 


t 

1 

{ F 
1 

{ 47 ' 

1 

• 


1 

1 

1 

4 


TABLE 3.6-4 CHANNEL ATB4 


1 

f 

Bit No. 


Function ' 

t 

4 

1 

0 

1 Bit 0 


1 

4 

1 

1 1 

Display Register - Displays the 1 


2 

1 2 

register selected by bits C-F of i 

1 

5. 

1 5 

channel BTAl in the HCU. 1 

j 

4 

1 4 



5 

\ 5 


1 

6 

1 6 


i 

7 

J .7 



8 

! 8 



9 

1 9 



A 

! 10 



a 

1 11_ 


1 

'C 

1 12 



0 

1 13 


1 

E 

i 14 


i 

! 

F 

I 15 
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3.6*1.1 (Cont. ) 


TABLE 3.6-5 CHANNEL ATB5 


1 

1 

{Bit No. 
1 

1 

i 

! 


Function 

i 

1 

1 

1 

1 

1 

0 

{ 

IBit 16 

Display 

Register - Displays the 

1 

1 

{ 


1 

1 17 

register 

selected by bits C-F of 

{ 

{ 

2 

! 18 

Channe I 

BTAl in the MCU. 

1 

1 

2 

1 12 



1 

{ 

4. 

! 20 



1 

1 

5 

1 21 



1 

S 

6 

1 22 



t 

t 

1 

7 

1 27 



1 

1 

8 

1 24 



1 

1 

9 

i 25 



i 

{ 

A 

1 26 



1 

1 

B 

J 2Z_ 



1 

1 

{ 

C 

1 28 



1 

{ 

0 

1 29 



1 

1 

E 

! 30 



1 

1 

1 

F 

1- -3J. 


- 

1 


TABLE 3.6-6 CHANNEL ATB6 


1 

1 


1 

IBit No. 

1 


Function 1 


1 


« 

4 

1 

i 0 

1 

{Bit 

32 

1 

Display Register 1 

i 1 

1 

33 

1 

1 2 

1 

34 

I 

i 

1 

3 5 

1 

1 4 

{ 

36 

1 

{ 5 

1 

37 


1 6 

1 

38 

1 

J 7 

< 

39 


i 8 

1 

1 

40 

f 

1 

1 9 

1 

41 

i 

1 A 

1 

•42 

1 

t 

1 8 


-kS. 

I 

1 C 

1 

44 

1 

{ D 

1 

45 

i 

{ E 

1 

46 

i 

! F 

1 

_4:Z_ 

1 

1 
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3*6. 1.1 (Cont.) 


TABLE 3.6-7 CHANNEL AT87 


t 

1 

t 

1 

Sit No. 


Funct ion 

1 

1 

1 

t 

i 

1 

1 

0 

Sit 48 

Display Register 

i 

1 

1 

1 

49 


1 

1 

2 

50 


i 

1 

3 

^ 1 



1 

4 

52 


1 

1 

t 

5 

53 


I 

1 

6 

■ 54 


f 

1 

7 

66 


1 

I 

8 

56 


1 

1 

1 

9 

57 


1 

1 

1 

A 

58 


t 

t 

I 

R 

69 


r 

1 

C 

60 


I 

1 

1 

0 

61 


1 

1 

1 

• 

E 

62 


i 

1 

_L 

F 

hi 

* 

I 

t 


TABLE 3.6-8 CHANNEL ATB8 


Bit No. 


0 -^ 

1 ^ 

2 

3 ^ 


Funct ion 


Memory SECOEO Fault or Instruction 
Stack Parity 
Microcode Parity Fault 
Not Used 


1 

4^ 

Event StOD 


1 

1 

5^ 

Single SECDED Error 


1 

1 

6 

CPU Clock - Used for gating data back 

« 

t 


to the CPU. The MCU 

cannot read 

1 

1 

1 


this i ine . 


1 

1 

7 

Monitor Mode 



i 

8 

Temperature - Dew Point 

A 1 arm 

I 

1 

9 

Not Used 


1 

< 

i 

A 

Section Power Fail 


t 

1 

I 

R 

60 H7 Tnout Power Fail. 

M . G , 1 .. 

t 

t 

1 

C 

60 Hz Input Power Fait, 

M.G.2 


1 

0 

.Not Used 


« 

1 

E 

CPU Idle 


t 

4 

1 

F 



1 

$ 

* These lines indi.cate whv the CPU 

has stopped. 
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3.6.1.2 Channels from HCU to CPU 

TABLE 3.6-9 CHANNEL BTAl 


Bit No. I 


Function 


MAC Hester Clear- Master Clear to Memory 
Interchange, and Main Memory only. 

This includes the I/O channels. This i 
signal must be set a minimum of 3 ! 

microseconds. 1 

I 

Stop - CPU will stop before next ] 

instruction issue. 


Step - Execute one instruction. Store I 

the register file and the invisible I 

package (job mode only); then stop. 1 

Faults must be cleared before the 1 

computer can be stepped. I 

I 

Run - Start CPU from manual stop or fault! 
stop. Faults must be cleared before ! 

computer can be started. ’ 1 

J 

t 

Store Register File - The Register File I 
is stored starting at address I 

QOOO in monitor mode and address ^.000 1 

16 16 i 

in job mode. I 

Load Register File - The Register File is! 
loaded starting at address OOOO in I 

16 1 

monito'r mode ai^d address Aff.QO in job I 

16 I 

mode. 1 


Computer must be stopped before executing these 
commands. 
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3.6»±*2 (Coni’.) 


TABLE 3.6-9 CHANNEL BTA± (Cont.) 


Bit 


Fund Ion 


\ 

I 

J 


6 


CPU Master Clear - Master Clear to Scalar! 
Uniti Stream, and Floating Point only. I 
Memory Interchange, I/O Channels, and I 
Main Memory are not included. This 1 
signal must be set a minimum of 3 1 
microseconds. 1 


7 


Clear Fault Conditions - This signal i 

clears the following conditions and 1 

allows the computer to be restart.ed with I 
a run signal (bit 3)5 1 

a. SECOED Double Error Condition 1 

I 


b. 

c. 

d. 


MIC Memory Parity Fault 


I 

Sword Bounds Hit ‘ } 

The Bounds Hit Address is released. ! 


8 

9 


A 

B 


C 

0 

E 

F 


I 

1 

I 

I 


i 

e. Reference to Illegal Address in { 

Stream Microcode. { 


f. Instructiona 1 Stack Parity Error I 

I 

I 

Clear SECOED Single Error, SECOED Fault ! 
Address and Syndrome Bits. ^ 1 


MCU Sync.- This signal is used in the 1 
CPU to gate the CPU data back to the MCU. 1 
When reading the display registers, the I 
MCU Sync, signal must be set after the. I 
read signal is set. I 

1 


Select SECOED Error Mode Two. 


Read - Transfer selected register and 1 
CIAR into the Display Registers. J 


Display Register Selection 
See Section 3.6. 4. 2 


( cont i nued ) 
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(Cont. ) 


TABLE 3.6-10 CHANNEL 8TA2 


Bit I 


Function 


2 

1 

1 

Memory 

Size 

Degrade 

Co de 



3 

{ 

1 

QQO 


2 Meg 

Memory 





1 1 

1 

0 01 

— 

2 Meg 

Memory, 





I i 

1 



Force 

Sect ion 

1 — > 

Section 

0 


! 1 

1 

010 

- 

2 Meg 

Memory, 





! 1 

! 



Force 

Section 

2 — > 

Sect ion 

0 

4 

ill 

• 1 

Oil 

rr 

2 Meg 

Memory, 





1 

! 



Force 

Section 

3 — > 

Sect ion 

0 


1 

1 

100 

= 

4 Meg 

Memory 





1 

i 

lOi 


4 Meg 

Memory, 

Fo rce 




I 

1 



Upper 

4 Meg -• 

-> Lower 4 Meg 



- 1 

J 

1 

110 

= 

8 Meg 

Memory 





Latch Memory Size Code 

Static Interrupt Gate - When this signal is 
a ”1**, time in.terrupts and external 
A interrupts will only be processed 

j between instructions. 


Select Mainframe Clock Freq. 


1 1 1 

t 1 i 

000 

= 

Nomina 1 




1 I 

OOl 


Increase 

clock 

freq. 

(1) 

> 

010 

z; 

Decrease 

c I ock 

freq. 

(1) 

1 ! 

oil 

= 

Se 1 ect var i ab 1 e 

f req . 


1 1 



(adjustment on 

osci 1 1 ato 

I 1 



pak) 




i 1 

lOO 

= 

Increase 

clock 

freq . 

(2) 

1 1 

t t 

101 

= 

Increase 

clock 

freq. 

(3) 

1 1 

110 

= 

Decrease 

c 1 oc k 

freq . 

(2) 

1 1 

111 


Deer ease 

clock 

freq. 

(3) 


NOTE? If clock frequency codes 4~7 
are, used, then code 3 is not 
available. Either codes 0-3 or 0-2 
and 4-7 are available. 

Delay Trailing Edge - Delay the 
trailing edge of all of the clocks 
on the panel which is specified by 
bits 11-15 of Channel BTA2. If bit 
8 and bit 9 are set, only the odd 
or even clock, on a panel, are moved 
depending on bit A. 


^ Computer must be stopped before executing these 
commands . 
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TABLE 3.6'iO CHANNEL 3TA2 (Cont.) 


Bit I 


Function 



i 

V 

Stat ic 


Delay Leading Edge - Delay the 
leading edge of all of the clocks' 
on the panel which' is soecified by 
bits 8-F of Channel BT A2 « It bits 8 
and 9 are set, only the odd or even 
clocks on a panel are moved 
depending on bit A. 

"0** - Hove even clocks (see 
description for bit. 8 or 9). 


"1” - Move odd clocks. 



Computer must be stopped be f ore .exe cut i ng these 
commands. 


4 

B (2 ) 
3 

C (2 ) 
2 

D (2 ) 
1 

E (2 } 
0 

F (2 > 


Funct ion 


Panel Designator for Clock Margins - Bit 
8 is the left-most bit of the designator. 


The design; 
Designator 


a re defined below. 
Pane i Cs ) 

C 


(to be supplied later) 


{ cont inued ) 


} CONTROL DATA 1 ENGINEERING 

{ 1 

ICORPORATION I SPECIFICATION 


NO. 1035A637 
DATE Dec. 1977 
PAGE 101 
REV. 


R A D L 


3*6.1. 2 (Coni’.) 


TABLE 3*6-10 CHANNEL 8TA2 (Cont.) 


i Bit 

\ — — — — — 

1 

Function 

i 

1 

I 

! 10 

1 

1 

X 

1 

J 11 

1 

1 

1 

i 12 

1 

1 

i 

1 

1 13 

} 

1 

t 

« 

{ lA 

1 

< 

i 

i 

( 15 

1 

) 

J 

J 16 

I (to be supplied later) 

« 

t 

1 

1 17 

i 

1 

J 

1 18 

t 

i 

! 

1 

J 19 

1 

1 

1 

\ lA 


1 

1 

! IB 

1 

1 

1 

1 

i 1C 

• » 

1 

t 

i 

1 ID 

J 

1 

1 

i 1€ 

« 

1 

1 

1 

J 

( 

1 IF 

1 

1 

t 

t 

i 

1 

1 

} 

1 

1 

1 

1 

i 

\ 

1 

i 
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3 . 6 . 1 . 2 (Cont. ) 


TABLE 3.6-li CHANNEL BTA3 


Bit No. I 


Function 


0 

1 


\ 

4 I 

I 

5 > 

6 I 

7 i 
/ 


9 

A 

3 

C 

0 

E 

F 


Not Used. 


i 


Send an external flag on the channel 1 
specified by the Channel Select Code in i 
bits 4-8, (^1) (^ 2 ) ! 


Set Channel Disable on the channel I 
specified by the Channel Select Code in I 
bits 4-8 . (n) (^ 3 ) 1 

I 

I 

Clear Channel Disable on the channel I 
specified by the Channel Select Code ! 
in bits 4 - 8 . (^i) {» 3 > I 


Channel Select Code. A code of I 

0 thru F selects a channel 5 

16 16 ' 5 

<0 thru 15 ) for the operation I 

10 10 I 

specified in bits 1 > 2 and 3.(^1) Bit I 

7 of BTA3 is bit 3 of the Channel Select! 
Code. I 

1 

1 

Select All Channels (o thru 15 ) for I 

10 10 I 

the operation specified in bits i, 2 and I 
3.(^1) 1 

I 

I 

Stop on SECOEO Single Error Detection. ! 

I 

Disable Stop on SECDED Double Error I 

Detection. 1 


Block Ex t erna i , In t err upt 

Disable Error Correction on all Read 
Buses-. 

Swap Register File Read on Exchange. 
Not Used 
N'ot Used. 


{ continued ) 
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3. 6.1. 2 (Cont.) 

{^i) The Channel Select Code bits 4-8 must be set 

before any commands are sent, and it must remain 
set until after the command has dropped. 

(•^2) The External Flag is transmitted to the device 
on the I/O channel corresponding to the code in 
bits 4-8. External Flag instructs the device to 
autoload. 

{*3) The Channel Disables are transmitted to the I/O 
Un l"t . If the disable line for a channel is set, 
no backing store references wil I be al lowed from 
that channel. Data transfers can proceed in and 
out of the channel buffer in an end-around type 
of operation. 


TABLE 3.6-12 CHANNEL BTA4 


Bit 

{ Function 

0 1 

1 Checkwor d bit 01 

1 5 

1 

t 

1 i 

2 1 

1 

2l Used for toggling I/O 

3 > 

i 

1 

3!-- Checkword bits 0-6 

4 1 

1 

41 

5 1 

1 

51 

6 1 

1 

i 

6l 

7 

1 

1 

i Block Write 
1 

Enable on SECDED Error 

8 

1 

J Not Used 

• 

9 

{ Not Used 
1 


A 

« 

! Force Reg. 

Fite Store at bit address 


1 20,000 on 

f 

Initial Exchange 

8 

1 Force Instruction Stack Parity 
1 

C 

i Enable I/O 

1 

Simu 1 a t or 

D 

1 Initiate I/O Simulator on Channel Flag 

< 

E 

1 Not Used 


F 

I Not Used 
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3»6*i.2 (Conl.l 


TABLE 3 . 6-13 CHANNEL 8 TA 5 


I Bit 


Funct ion 


0 t 

1 5 

2 > 

3 ! 

4 i 


5 1 

6 > 
7 5 


8 

9 

A 

B 

C 

0 

E 

F 


Not Used 


Bounds Limit Load Code 
D = Null 

1 = Load Bits (35-42i 

2 = Load Bits (51-581 

3 = Load Bits <43-50> 

4 = Null 

5 = Load Bits (51-58) 

6 = Load Bits (35-42) 

7 - Load Bits (43-50) 


Upper 
Upper 
Uope r 

Lower 
Lowe r 
Lower 


Bounds 

Bounds 

Bounds 

Bounds 

Bounds 

Bounds 


Bounds Address Bits 


Address bits are Loaded as 
starting and ending with a 
Code =0 Null 


foil OWS y 

Nun Code: 


Code 


1 

Set up 
Bounds 
Set up 
Bounds 

Bits 

(35-42) 

Upper 

Code 

— 

3 

Bits 

(43-50) 

Upper 

Code 

— 

2 

Set up 
Bounds 

Bits 

(51-48) 

Upper 

Code 


6 

Set up 
Bounds 

Bits 

(35-42) 

Lower 

Code 

— 

7 

Set up 
Bounds 

Bits 

(43-50 ) 

Lower 

Code 

Code 

.w 

5 

4 

Set up 
Bounds 
Nul 1 

Bits 

(51-58) 

Lower 


Bounds I imiits are absolute 
half-word addresses. Bits 
( 55 - 58 ) must be zero. 


s physical 
(35-36) and 


Due to the opera'tional characteristics 
of the maintenance interface, only one 
bit of the code can be changed at one 
time. Address bits must be loaded in 
such a manner as to leave the Load Code 
bits undisturbed. - Address bits are 
transferred on the Leading Edge of a 
code change, the address bits must be 
set up beiore a code change occurs. 
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'3.S.1.2 (Coni.) 


TABLE 3.5-14 CHANNEL BTA& 


Bit 


Function 


0 

1 

1 

Check bounds 
reads 

on 

memory 

1 

1 

I f 

bits 0 

and 


1 




1 

1 

1 or bits 

2 

1 

J 

1 

I 

Check bounds 
writes 

on 

memory 

I 

1 

a nd 

3 are 

zero 


! 




! 

n 0 

bounds 

hi ts 

2 

1 

I 

t 

! 

1 

Check bounds 
ref erences 

on 

CPU 

1 

1 

I 

c an 

occur 

• 

3 

Check bounds 

on 

channe 1 

1 

1 





1 

references 


_ 

1 





a 

5 


8 

9 

A 


J Stop CPii on bounds hit 
I 

I Enable bounds check - The bounds 
J addresses and conditions must be set uo 
{ before the enable is set. 

I 

i Count A - Monitoring Counter A is 
1 enabled while this line is a -i— and 
I held clear when this line is a ”0**. The 
I proper counter specification and bits 
i 8~E of channel 8TA6 must not change 
\ while this line is uo. 

I Count 8 - Monitoring Counter B is 
I enabled while this line is a ”i” and 
I held clear when this line is a ”o”. The 
} proper counter . spec i f icati on and bits 
J 8-E of channel BTA6 must not change 
I while this line is up. 

I Clear counter (see code 6 in Section 
I 3. 6. 4. 2). 

I Stop CPU on Counter A Increment! 

I Stop CPU on Counter 8 Increment! I 

! - — J 

I See 

! Section 

I 3.6.4. 1.3 


(continued) 
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3* 0.1.2 (Coni.) 


TABLE 3.6-i'f CHANNEL BTA6 (Cent.) 


Function 


Enable Carry into Al I 

Enable Carry into A2 

* 

Enable Carry into 81 

Enable Carry into B2 

I 


•-See Section 
3 .6* 4.1.2 


"O" - Load Counter A Event Selects and 
Gates (Channel 8TA Bits 0-F). 

"1" - Load Counter B Event Selects and 
Gates (Channel BTA Bits 0-F). . 

This bit should be set to the prooer 
counter before the count specification 
is set into Channel BTA7. 


{ cont 1 nued ) 
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(Cent. ) 


TABLE 3.6-15 CHANNEL 8TA7 


Function 


Event Select for Counter Al and 8l i 
-See Section 3.5. 4.1 for codes I 


■Event Select for Counter A2 and 82 1 
See Section 3. 6. 4.1 for codes I 


Not Used 


Selected Job Gate 


Monitor Mode Gate 


Job Mods Gate 


Data Flag 56 Gate 
Data Flag 57 Gate 


MCU Event 1 

Counter Gatesl 
--See Section 1 
3. 6. 4. 1.1 1 
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TABLE 3.6-16 CHANNEL BTA8 


Function 


8-bit function select code. Bit 0 
is the left-most bit of the code. 
See event code 12 in Section 

16 

3. 8. 4.1. 


8-bit function mask. Bit 8 is the 1 
left-most bit of the mask. See j 
event code 12 in Section 3. 6. 4.1.1 
16 1 


F 
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3.6*2 MCU/Mi crocode Memory Interface 

Upon power up of the FMP al ! microcode memory 
contents are undefined since that memory is built of 
RAM circuits with volatile storage. Each of the FMP 
microcodes can be loaded by an MCU function which is 
sent over the FMP T/0 channels from one of the 
processors acting as the MCU. A special trunk 
address identifies the special I/O channel which does 
not transfer data to the Sacking Store? but instead 
provides control information for the FMP, and 
retrieves status information from the FMP from one 
or more of the internal maintenance channels 
contained within the FMP. One of the maintenance 
functions is the loading of microcode to each of the 
microcode memories. Each block of microcode 
received by the MCU interface is checked for data 
errors (using the CRC coda in the trunk message) and 
sent to its respective microcode memory system. 

Each block is preceded by a unique 16-bit address 
which identifies the particular microcode 
dest inat ion. 


3*6. 2*1 Microcode Units and Addresses 

(to be supplied later) 


3. 0.2*2 Microcode Error Checking 

Under control of the MCU interface control signals, 
a microcode memory can be loaded with data from the 
trunk* The data carries its own parity bits (one 
per word) which are generated by the assembler at 
the time the microcode is created. This block can 
be read out of each microcode memory sequentially by 
the MCU interface so that the memory can be checked. 
Each word read is parity checked and i f an error 
occurs the location of the failing word is unloaded 
by the MCU interface via the P counter of that 
microcode. 

During normal startup procedures, each microcode 
memory is loaded In turn with its unique microcode 
and the entire contents are swept out on an MCU- 
controlled, seauential read operation to verify the 
integrity of that memory. 

( continued) 
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3, 6 , 2 « 2 (Cont. ) 

During operation of the FMP» each microcode access 
is parity checked. If a parity error occurs in anv 
microcode, the HCU is signalled via the network 
trunk and the FHP CPU is stopped as soon as possible. 
The location of the error P counter and the address 
of the failing microcode unit are then provided to 
the MCU interface for transmission to the MCU 
processor on the trunk. 


3,6*2,3 MCU Interface Channel Bits 

{to be supplied later) 


3»6,3 Microcode Memory Channel Programming 

{to be supplied later) 


3*6, 3.1 N/A 


3.6.3, 2 N/A 
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Typical Microcode Interface Function Codes 

For al I channel functions the address that 
accompanies the function and the nul 1 function are 
ignored. The following 3-bit function codes control 
the microcode memory ; 

TABLE 3.6-17 B TYPICAL FUNCTION CODES. (MIC. MEM.) 


8 ^ 

§1 

‘^s 


Bit 0 1 Bit 1 


Site 1 


Function 


Null - Automatically sent 
by the MCU interface as 
the second half of any 
other function. 

Read Memory - Read a 
block' of microcode memory 
from the current 
microcode "P" address. 

Write Memory - Write a 
block of microcode memory 
from the current 
microcode "P" address. 

Not normally used but 
will perform the same as 
a FOP. 

Data - Automatically sent 
with the data during a 
write microcode memory 
operation. 

Read Status - Read the 
current microcode status. 
See Section 3.6.3. 5 for 
exp I anat i on . 

Write Switches - The 
switches provide control 
of microcode execution. 
See Section 3. 6*3. 4* 

EOP - End of Operation 
clears the interface of 
al! previous functions 
and also clears the 
counter that controls the 
data fan-in and fan-out 
to/from the channel. 
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3»'6»3«4 Microcode Switches 

Microcode switches are l~bit controi terms used to 
control the microcode memory. Each switch is one bit 
of the Write Switch Control Word. -The 110 function 
code {write switch) causes the microcode memory to 
store the Write Switch Control Word in a register. 

The MCU’ interface receives this data from the I/O 
trunk and sends it to the microcode control. The 
following is a definition of each switch function and 
a description of its use. 

1. Switch Function Definitions 

TABLE 3.6-18 MICROCODE SWITCH FUNCTIONS 


J Bit 


Funct ion 


0 


1 


2 


3 


4 

5 


i 


Go Microcode - Strobing this bit will 
cause microcode to start execution at the 
current microcode ”P" address. 

Kin.- Setting this-bit will stop any 
microcode instructions executing at the 
time the bit is set. The instruction will 
come to a normal halt with ‘‘P“ pointing to 
the next word to be executed-. Execution 
can be resumed by setting bit 0. 

Sense Switch - Any microcode program can 
sense the condition of this switch for 
program control (used mainly by 
diagnostics) . 

P to 0 - Strobing this bit will force the 
"P" register to zero. KI I I should be set 
either previously or in the same word so 
as to come to a normal halt. 

Clear Checkpoint - Strobing this bit will 
clear, the check point flip-floo. 

Drop Contro i -Setting this bit disables 
control of the CPU and the I.C.s from 
microcode. This will prevent undefined CPU 
ooeration due to a microcode memory test. 


(continued) 
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3. 6. 3. 4 (Cont.) 

TABLE 3.6-18 MICROCODE SWITCH FUNCTIONS (Cont.) 


I Bit 


Funct ion 


S 6 


t 8 


C-F 


Change Status Word 2 Definition - Bits 8-F 
of Status Word 2 become bits 0-7 of an IC 
register. See Section 3 . 6 . 3. 5. 

Enable control of the register logical 
pipe from microcode. 

Function tor Sea I ar " Microcode not yet 
def ined . 


Sweep Scalar Microcode 


Must be set to 


Write Scalar Microcode 
Write. 

Scalar Microcode, disables microcode write 
enab 1 es . 

1 = Enables Scalar Microcode to sweep PMOO 
0 = Enables Scalar Microcode to sweep PMQi 

Functions for Scalar Microcode not yet 
defined. 


(cont inued) 
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CConT*) 

2. Use of Switch Functions 

1. Switch Functions Q, 3 and u are one-shot 
functions. This is accomplished by having the 
required bit set in the even i6~bit word of a 
transfer and clear in the odd 16-bit word. 

If the bit is set into both halves of a 
32“bit transfer, for Instance, the function 
wil I be performed in that transfer but wi 1 j 
possibly be ignored if sent in the next 
transf er . 

2. Switch Functions Q and 3 are delayed by one 
cycle so that other functions sent in the 
same data word have time to propagate* i.e,, 
‘*kiil" and *'P to 0” together are legal as are 
•‘sense switch” and "go microcode". Other 
combinations are also legal. 

3. Switch Functions i, 2» 6? 7 ■> A and B 

are latching functions that are caught and 
held until another function is sent. Note, 
however, that a single function consists of 
two or more data transfers -- each transfer 
clearing and loading over previous data 
transfers so that a switch that Is meant to 
be valid both during and after the function 
must be sent in both halves of a 32-bit data 
transfer and any latching function that is 
supposed to remain valid through another 
"send switches" function must be sent again 
with that function, again present in both 
halves of the 32-bit data word. 
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3.6»3o5 Stream Microcode Status 

The inout of status, to the MCU can be of any number 
of words; but all words after the first word will be 
word 2 of the status. 

The input of status does not have any effect on 
microcode or microcode controls. 


iCONTROL DATA I 

j } 

ICORPORATION I 


Upon the receipt of a 101 tread status) channel 
function code, the MCU interface will load the 
channel with the following status words. 

TABLE 3.6-19 MICROCODE STATUS 


Bits . 
(Word 1) 


Mean! ng 


Checkpoint - Software uses this bit I 
to Indicate to the MCU that the I 

microcode has reached some predefined! 
status found an error or reach some t 
predefined address for debugging, tori 
examp I e. ! 

I 

Flags - The current state of flags 0,! 
1, 2, 3. ! 

I 

1 

P - The current state'of the P ! 

(microcode address) register, 1 


Bits 
(Word 2) 


Mean ing 


Run - This bit will be used to 
indicate the microcode is executing. 

Jl - The current state of the least 
significant 4 bits of the Jl 
register. 

J2 - The current state of the J2 
register. (See bit & of the switch 
function contro i word). 


"*-The contents of P do not indicate the address at 
which microcode has stopped until the second minor 
cycle after the RUN bit has gone to zero. Thus it 
is necessary to read the status word twice, once 
to determine that microcode is not running, and 
once to read P, 
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5*&*3.6 Interface Sequences 

After selection of the MCI) interface the 

following are examples of possible control seduences. 


TABLE 3.6-20 INTERFACE SEQUENCES 


Step 


8 


Code' ! 

I 


ill 1 


OiO 


C 


0 

E 


lOO 1 

I 

t 

I 


I 

111 I 

I 

I 

1 


t 

i 


step 

A 

8 

C 


Code 

111 

001 


I 

I 


Sequence (Stream Units Hrite, Microcode) 


EOP - To- clear the interface. Initiate (bit 
0) should not be sent with any EOP function. 

Write Mode - The address with this function is 
ignored* the write will proceed from the 
current P address. 

Data - Data sent to microcode must (exceot on 
the last transfer) be sent in integer 
multiples of microcode words. One microcode 
word is iif 16 -bit transfers. Data will be lost 
and/or rearranged if this Is not observed. 

EOP 

Repeat from Step 8 as many times as necessary 
to complete transfer of the block of data. 


Seouence (Stream Units Read Microcode) 


EOP 

Read Mode - The address is ignored. 

Input the data. The same caution as in Write 
Microcoae Step C applies. Data starts from 
the current mi.crocode P address. 


D 

E 


111 1 EOP 
1 

1 Repeat from Step B as many times as necessary. 


Note; If the last operation performed in a sequence is an EOPj 
the next sequence does not have to start with another EOP. 


(cont inued) 
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3.6®3"6 (ConT.) 

After selection of the MCU interface the following 
are examples of possible control sequences. 


TABLE 3.5-20 INTERFACE SEQUENCES (Cont, ) 


Step 1 

1 

1 

Code 

I Sequence (Stream and Scalar Write' 

1 Switches) 

A I 

t 

111 

1 EOP 

I 

8 1 
t 

IIQ 

1 Set Switch Mode - The address is ignored. 

t 

C I 
1 
I 

1 

1 

IQQ 

1 Data - Although one 32 -bit transfer is the 
1 normal data length, there is no restriction on 
1 data length if the extra data length can be 
1 useful - repeated starts for instance. 

1 

0 s 

111 

1 EOP 

step 1 

1 

Code 

1 Sequence (Stream Read Status) 

A 1 
} 

111 

} EOP 

f 

8 I 
! 

101 

1 Set Status Mode - The address is ignored. 

C 1 

1 

1 


1 Input Data - AM data after the first word is 

1 status word 2* 

1 

0 1 

ill 

1 

1 EOP 


Note: If the last operation performed in a seauence is an E0P» I 

the next sequence does not have to start with another EOP. I 


{ continued) 
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3»S*3*6 (Coni’*) 

TABLE 3.6-20 INTERFACE SEQUENCES (Cent.) 


I step 

1 

Code 

1 Sequence (Stream and Scalar Write 

1 Switches) 

t 

1 

I 

^ ^ 1 

! Step 


Code 

\ Sequence (Write Scalar Hicrocode) 

t 

1 

1 

1 A 

i 

1 

111 

1 EOP - To clear the interface 
1 

t 

1 

f 

1 8 

1 

OlO 

1 Write Node - Bits Q-8 of the second 16 bits 

1 

of i 


1 

1 


i the address? selects daughter boards 0~8? 

t 

1 


1 

f 


i respectively. The first 16 bits of the addressl 


« 

) 


1 are ignored. The write will proceed from the 

1 

1 


i 

1 


5 current P address. 

1 

1 

1 C 

1 

1 

100 

i Data - Bits 0-3 are Write Enables and bits 

1 

t 

1 


1 


1 4-^5 are Data. The microcode address is 

t 

» 


{ 


1 incremented by one for each i& bit quantity 

1 

1 


i 


1 sent Jby the MCU. 

1 

» 

1 D 

1 


1 Repeat step C until the selected Auxiliary 

1 


j 


! Board has been loade.d (normally i024 16~bit 

! 


1 

1 

\ 


i words). 

t 

t 

1 

1 

1 

9 

1 

{ E 

1 

1 

! 

111 

1 EOP 

i F 

1 

1 

1 


1 Repeat from step B to load other Auxiliary 
1 Boards. 

1 

1 

1 

1 

I Note { 

If 

the last operation performed in a seauence is an EOP 

-- I 
1 1 

Ithe next 

sequence does not have to start with another EOP. 

! 
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3«6»3«7 Writing or Sweeping Scalar Microcode Memories 

The scalar microcode consists of 5 memories; PM00» 
PMOi? HMOO-j DM 00 and GMflQ. Although each operates 
independently during CPU instruction execution? they 
are alt addressed simultaneously curing writing or 
sweeoing operations. 

3*6«3«7.1 Scalar Microcode Memory Write Operations 

For write operations, the write enables at each 
auxiliary board control which auxiliary board and 
which address within an auxiliary board is to be 
written. Since 12 bits of data are written at a 
time, the write enables are also responsible for 
choosing which 12-blt portion of a microcode address 
is to be written. 

Under the. control of the write enables and auxiliary 
board select, one auxiliary board is written at a 
time^ The address registers on the auxiliary boards 
will first be set to QO and then cycled thru FF 

16 16 
and then back to 00 while writing one-fourth (or 

16 

twelve bits) of an auxiliary board. The write 
enable will then change to address the next twelve 
bits of the particular auxiliary board and the 
address register will again cycle through all 
addresses. This operation will occur four times on 
each of the 9 auxiliary boards. 

It is possible, by bringing up all 9 auxiliary board 
selects and all write enable bits, to write all bits 
of all auxiliary boards in one write of lOO words 

16 

(except PMOO/PMOl). Each 12-bit segment of the 48- 
bit word will be duplicated. This would be done 
only as a maintenance aid for pattern generation 
during either write or sweep operations. 
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3»6«3.7.2 Scalar Microcode Memory Sweeo Operations 

Sweeping ot the scalar microcode memory is an 
operation to be done to detect a parity error on any 
of the 9 microcode auxiliary boards. The operation 
simply consists of referencing all 9’ auxiliary' 
boards simultaneously with the same address regl.ster. 
Since there is one parity bit per auxiliary board 
par microcode memory ^ any parity error or errors 
will be isolated to the failing auxiliary board or 
boards . 

The control signals necessary to perform the sweep 
operation are Sweep (switch, function bit A) » Enable . 
PMOO/PMOl (switch function bit 8) and Clear Fault. 
Sweeo should be enabled during the entire sweep 
operation. Enable PMOO/PHOl selects PMOO or PMOl 
microcode memories and Clear Fault will clear any 
parity errors caused by the sweep operation. If 
Clear Fault is sent while sweep is still set» not 
on I y .w i 1 I the parity fault condition be cleared but 
sweeping will continue. However, since the sweep 
address, upon a parity fault, is 3 ? 4 or 5 addresses 
ahead of the actual parity fault address, sweeping 
immediately after a parity fault will ‘'skip” 3, 4 or 
5 addresses respectively. For example, if the 
parity fault address is 22 on PMoo then addresses 23, 
24 and 25 will be skipped. 

The register used to reference all of the auxiliary 
boards during the sweep operation is cleared before 
and after the sweeo operation. Thus, sweeping 
starts with address zero and, because of the time 
delay in detecting parity errors , will end beyond 
that address which caused the parity error. For 
example, if the parity error occurred on HMOO at 
address 125 then the address displayed at the MCU 
will be 130 and is therefore 5 ahead. 

The following list specifies how far ahead of the 
parity fault address the sweep ad, dress will be. 

GMOO 5 ahead 

OMOO 4 ahead 

HMOO 5 ahead 

PMoo 3 ahead 

PMOl 3 ahead 
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3.6.4 Monitoring System Activity by the MCU 

The MCU monitors the output of two display registers 
as its main medium of monitoring system activity. 

One display register contains the output of the 
Current Instruction Address Register (CIAR). The 
other display register contains the output of the 
register selected by the HCU. A 4 -bit code sent from 
the MCU selects which register the display register 
will present. In addition to monitoring the display 
register^ the MCU can also monitor the microcode 
inemory status and other CPU status. 

3. 6. 4.1 Monitoring with Counters 

For monitoring purposes^ the CPU has four 16 -bit 
counters. Each of these counters can be connected to 
an event line selected by a command from the MCU. 

See figures 3.6-1 and 3.6~2. A list of events which 
can be counted and their corresponding select codes 
Is given in Table 3.&-21* For ourposes of discussion? 
one pair of 16-bit counters Is referred to as 
Counters Al and A2 . The other pair is labeled Bl and 
82. Counter A and Counter B are completely 
independent and cannot be tied together* however, 
they do share the same input event lines and-gate 
lines. The counters can be read by selecting them for 
input into the MCU display register. They can also be 
combined in various ways to form one or two 32-bit 
counters. This recon f i gurat ion is accomplished via 
the carry lines from the MCU. The counters are 
enabled by a number of hardware and software gates 
selected with a mask from the MCU. The MCU has the 
option of stopping the CPU count condition. This 
option is exercised by use of the stoo lines. 


(conti nued) 



ICONTROL DATA i 

ENG 

I N E E R 

I 

N G 

NO. 

DATE 

1035A637 
Dec. i977 

ICORPORATION I 

SPEC 

I F I C A 

T 

I 0 M 

PAGE 

REV. 

122 


R A D L 


3.6.itvl {Cont, ) 



Figure 3»&-l 


Block Diagram of Counter Logic Lines- 
(cont inued) 
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3*6«^**1 (Contfl) 


CPU INPUTS 

-J 


EVENTS 

1 


INVJSIBLE PACKAGE COUNTER ENABLE BJT 

V 

JOB MODE 

Y 

MONITOR MODE 

V 

DATA FLAG BIT 56 


DATA FLAG BIT 57 


I SPECIFY 
COUNT 
FOR - 
[COUNTER 
A 


MCU 

INPUTS^] 


EVENT 
SELECT- , 

Al L> 




EVENT 

SELECT. 

A2 


{a 


ISSLECTEO 
JOB > 
GATE 


JOB 

HOO£>r 

GATE 


MCU . 

gates' 


MONITOR 
MODE >- 
GATE 


DATA 

FLAG 56>- 
GATE 


DATA 

IFLAG 57>H AND f«- 
GATE 


f->.a f-^B f^82 


V INPUT EVENTS Y 
COUNTER A2/B2 


Y INPUTS EVENTS Y 
COUNTER A! J Bl 


f->B2 


SELECITON 
NETWORK Al 


SELECiTlON 
NETWORK A2 


AND 

i: 


COUNT^ 
A ' 


ENABLE CARRY^ 


OR 

t: 




EVENT 

COUNT 

LINE 


EVENT MASK 


ENABLE CARRY 
INTO A2 ^ 

STOP CPU ON 
COUNTER A 
INCREMENT 


:0- 


EVENT 

COUNT 

LINS 


AND 


AND 


18-BIT COUNTER 


AND 

-1 I6-8IT COUNTER 


r 


A2 

*■ ' '1 ^ 


Al 


Fiqure 3»6"2 Block Diagram of Counter A 
(cont inued) 
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(Cent. ) 


TABLE 3.6-El COUNTER EVENTS 


^Codes 


{Counter (Counter 
lAi/Bl !A2/S2 


EVENTS 


Number ot branches out of instruction stacK. 


Number of branches in instruction stack. 


Number of times microcode field MON = 1 is 
selected. 


Number of shortstop path usages. 


Not Used 


Not Used. 


Number of normal channel backing store 
requests. 


Number of normal channel backing store 
requests accepted. 


Number pf CPU memory requests. 


Number of CPU memory requests accepted. 


Total number of memory requests. 


Total number of memory requests acceoted. 


Number of minor cycles from selected 
instruct ion' i ssue to next^ non-selected issue. 
The counter will begin counting when an 
instruction whose function code meets the 
conditions described in code 12 below» is 
loaded into IRQ. It will stop counting when 
the first following instruction which does NOT 
meet the conditions is loaded in'to IRO. 


I "^These are 5 -bit codes, expressed in hexadecimal. 


(continued) 
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3*&*4*1 (Coni'.) 

TABLE 3.&-21 COUNTER EVENTS (Cont.) 


^Codes ' 

16 i EVENTS 

1 

Counter ICounterl 

Al/Bi !A2/B2 5 

i I 

1 I 

12 I {Number ot times a oarticufar function code or 

I la particutar category of function codes is 

! 'executed. The count condition is determined by 

{ Jan 8-bit select code and an 8-bit mask sent to 

I I the CPU on HCU channel BTA8. If the select 

I I code bits and the corresponding instruction 

I I function code bits are equal wherever there is 

I {'•l'' in the mask* the counter wil I be 

I 1 incremented . If the mask contains all zeros, 

I lall instructions will be counted. 

j I — ^ 

I- 12 I Time - 1 MHz. 

1 i 

13 I I Time between selecting microcode monitor 

I I field, M0N=2 and selecting M0N=3 . 


I 13 I Number ot cycles where data is not available 

I I at the output of a functional unit once data 

I 1 has been requested for all input streams. This 

1 I time does not include the time required for 

I {initial setup (preceding the input of the last 

I I operands to a functional unit). This count 

I I thus permits the programmer to analyze the 

I {amount of time required for startup memory 

{ 'accesses, pipeline/functional unit length, and 

I {memory conflicts for a specific instruction. 


^These are 5~bit codes, expressed in hexadecimal. 
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" j\ H 

Count Gates 

U L. 

and CPU 

Lines 




The counters are incremented when the selected event 
occurs? the count tine is uo? and one or more of the 
following gate^line conditions is satisfied: 

1 . The Event Counter Enable bit is set in the 
invisible package of the job currently being 
executed and the Selected Job Gate from' the MCU 
is set. This a i i ows . count s to be made during 
selected Jobs only. 

2. The CPU is in job mods and the Job Mode Gate 
from the MCU is set. 

3» The CPU is in monitor mode and the Monitor Mode 
Gate from the MCU is set. 

4. Data flag bit 56 (or 57) is set in the Data Flag 
Register of the CPU and the data flag 56 (or 57 ) 
gate from the MCU is set and the CPU is in 
monitor mode. 

5< Data flag bit 56 (or 57) is set in the Data Flag 
Register of the CPU and the data f lag 56 (or 57 ) 
gate from the MCU is set and the Event Counter 
Enable bit is set in the invisible package of 
the job currently being executed. 

There is one set of gate-line enable logic for 
Counters Ai and A2 and one set for Counters Bi and 
825 therefore? Counter A may be enabled by different 
gates than Counter B. 

In summary the CPU lines are: 

1 . Data flag bit 56. 

2. Data flag bit 57 . 

3. Monitor mode, 

4. Job mode. 

5. Job enable of monitoring counters from invisible 
pa ckage , 

There is a corresponding MCU gate for each of the 
above. 


R^EODUClBILmr OP IHB 
ORIGINAL PAGE IS POOR 
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3*6«4.1.2 Carry Lines 

There is one enable carry line associated with each 
i&-bit counter. Enable carry line Ai enables the 
carry .into Counter Al from Counter A?. Enable carry 
line A2 enables the carry into Counter A2 from Al . 
There are equivalent lines for the B Counter. A zero 
on carry lines Al and A2 al lows the Counters to 
operate as two lE-bit counters. Only half of the 
total number of events are available at the selection 
network for one Counter Al or A2j therefore, if a 
32“bit count is desired, either counter may have the 
lower bits of count. For example, if an event is 
enabled to Counter Al and a 32-bit count is desired, 
then enable carry line Al must equal ” 0" and enable 
carry line A2 must be a "i”« In this example. Counter 
Ai wll I have the least significant bits and Counter 
A2 wi! I have the most significant. 

3.6.4»1*3 ' Stop Lines 

There is one stop line associated with each counter 
pair, one for the A Counters* and one for the B 
Counters. When the stop line is a "I", an event 
incrementing either 16~bit counter wil ! stop the 
comouter. Mode line "Event Stop" is returned to the 
MCU (bit 4, channel AT88) to show why the CPU has 
stooped. The MCU, after sending a "Clear Fault 
Signal", may restart the CPU. 

3«6»4»1»4 Counter Setup 

Typically, the four counters would be set up by the 
MCU as foil ows 5 


Set 

the f 0 1 

1 owing 

bits 

as 

required J 


a. 

Stop CPU on A 

Increment (bit 

9 , channe 1 BTA6 ) 

b. 

Stop CPU on 8 

Increment (bit 

A , channe I BTA& ) 

c. 

Enab I e 

carry 

into 

Al 

(bit 

B, 

c hanne I 

BTA6 ) 

d. 

Enab I e 

carry 

into 

A2 

(bit 

c. 

c hanne 1 

8TA6 ) 

e. 

Enab { e 

carry 

into 

81 

(bit 

0, 

c hanne t 

BTAe ) 

f . 

EnabI e 

carry 

into 

B2 

(bit 

E, 

channe 1 

8TA6) 


2. With bit F, channel BTA6, a zero, set event and 
mask selection for Counter A into channel BTA 7 . 


3 . Set bit F, channel 8 TA 5 to a "i". 


4 . Set event and mask selection for Counter B into 
channel BTA 7 . 
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ICONTROL DATA 1 

j I 

JCORPORATION I 


ENGINEERING 

SPECIFICATION 


NO, 10354637 
DATE Oecc 1977 
PAGE 128 
REV. 


R A D L 


3 • 6 • 4 B 1 » 4 (Cont») 

5, If Al/Bl event code 12 for function counting has 
been selected} set channel 8TA8 to the desired 
function and mask, 

6. Set count line A or 8 tbit 6 or channel BTA6) 
as desired. 

The counters will now be counting events and will 
continue to count until their respective count lines 
are drooped. 


3.&b4b 2 Oisolay Registers 

There are two 64-bit display registers that can be 
monitored by the MCU. One display register is used 
for the Current Instruction Address Register (CIAR) 
and the other is used for a register that has been 
selected by the MCU. The register is selected by a 
4”bit code transmitted on bits C-F of channel BTAi, 
Any unlisted bits (such as bits 0-16 for code 3) are 
unde f i ned . 

The MCU must send a read signal to enable the CIAR 
and the selected register into the display registers. 
The read signal has been defined as bit 8 on channel 
BTAl and its leading edge simultaneously transfers 
both registers ' into the display registers. The 
register select code must be set uo by the MCU before 
the read signal is transmitted to the CPU. 

The CIAR is received on channels ATBl - AT83 of the 
MCU and may read while the CPU is running. The 
selected register is received on channels ATB4 - ATB7 
of the MCU. See Section 3.6.1. 1 for bit assignments. 
The selected register on channels ATB4 - ATB7 may 
only be read when the CPU is stopped. 


(continued) 
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3«6«4*2 (Continued) 

The select codes and corresponding registers are 
listed below! 

TABLE 3.6-22 DISPLAY REGISTER SELECT CODES 


ICode 1 
i 1'6I 

Register(s) 

1 

r 

) 

Bits 1 
1 

i 

1 

1 

1 

D i’ 
1 

Current Instruction Register 

1 

1 

1 

Q-63 1 
} 

i 

1 

1 

1 

1 

1 

1 

1 

1 t 

t 

1 

1 

1 

1 

Data Flag Register 

i 

i 

« 

i 

i 

} 

1 

1 

3-15 1 
19-311 
35-471 

31-58! 

1 

i 

1 

1 

2 1 

Invisible Package Address 

1 

0-22 i 

1 

1 

t 

i 

(Absolute Sword Address) 

1 

i 

1 

1 

i 

1 

t 

1 

3 ! 

External Interrupt Register 

1 

1 

\ 

15-311 

1 

1 

Monitor Interval Timer 

1 

15 1 

I 

1 

1 

Channel Q 

! 

16 ! 

1 

I 

1 

1 

17 1 

1 

1 

1 

1 

I 

17 i 

i 

1 

2 

I 

18 1 

1 

i 

1 

3 

1 

19 1 

1 

J 

4 

1 

20 ! 

i 

i 

1 

5 

1 

21 1 

1 

1 

6 

1 

22 ! 

i 

1 

7 

1 

23 1 

1 

1 

8 

1 

24 1 

I 

1 

9 

f 

25 i 

1 

1 

A Q 

1 

26 ! 

1 

1 

1 

1 

11 

1 

1 

27 i 

1 

1 

12 

1 

28 ! 

1 

1 

1 

13 

1 

29 1 

1 

1 

! 

14 

t 

30 1 

t 

i 

15 

I 

31 1 

1 

! 

Channel Read Active - Write Active 

1 

32-63 1 

1 

1 

Channel 0 

1 

32-33! 

1 

1 

1 

i 

1 

34-35! 

I 

1 

1 

2 

1 

36-37! 

1 

1 

1 

3 

1 

38-39 1 

1 

1 

4 

1 

40-41 i 

1 

1 

5 

1 

42-431 

1 

] 

1 

6 

1 

44-451 

1 

1 

7 

I 

46-47! 

1 

1 

1 

8 

t 

1 

48-49! 

1 

1 

9 

1 

50-511 

{ 

1 

10 

1 

52-53 i 

1 

1 

1 

11 

1 

54-551 

1 

1 

12 

1 

1 

56-57! 

1 

1 

1 

13 

1 

58-59 1 

1 

1 

1 

14 

1 

80-61 1 

1 

1 

15 

1 

62-631 


( cont inued ) 
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3* 6. A. 2 {Coni*.^ 

TABLE 3.&-22 DISPLAY REGISTER SELECT COOES (Cont.) 


I Code 


4 


16 


Register ( s ) 


Bits 


SECOEO Fauit Read Bus Code 

I/O Bus = Code 0 

Rl Bus = Code 1 

R 2 Bus = Code 2 

R3 Bus = Code 3 

Scalar Bus = Code 4 

RNS Bus = Code 5 

Instruction Stack Parity Fault 

MIC Memory 0 Parity Fault 

MIC Memory i Parity Fault 

Scalar MIC Parity Fault 

Double Seeded Error, Syndrome Bits must be 
checked to determine if' address and Bus Code are 
valid. 


0-2 


4 

5 

6 

7 

8 


Syndrome Sits ‘ 9-15 

Parity Fauit on Auxiliary Board 0 I 16 

Parity Fault on Auxiliary Board 1 I 17 

Parity Fault on Auxiliary Board 2 I 18 

Parity Fault on Auxiliary Board 3 I 19 

Parity Fault on Auxiliary Board 4 I 20 

Parity Fault on Auxiliary Board 5 I 21 

Parity Fault on Auxiliary Board 6 I 22 

Parity Fault on Auxiliary Board 7 1 23 

Parity Fauit on Auxiliary Board 8 I 24 

PMOl Enabled for Parity Checking - I 25 

Scalar Microcode Address -Bit 0 I 26 

Scalar Microcode Address -Bit i I 27 

Scalar Microcode Address -Bit 2 I 28 

Scalar Microcode Address -Bit 3 ! 2*3 

Scalar Microcoae Address -Bit 4 I 30 

Scalar Microcode Address -Bit 5 I 31 

Scalar Microcode Address -Bit 6 « 32 

Scalar Microcode Address -Bit 7 1 33 




I 

I 

I 

I 

t 

I 


NOTES All Fauit/Error conditions are cleared I ! 

by the “Clear Fault” signal from the 5 1 

MCU except the SECOED Error and the I 1 

■ Syndrome bits. These are cleared/ I I 

released by the "Clear Single Error" I I 

signal from the MCU. I I 


.(continued) 


REPEODXTCIBILrry.OF THE 

ORIGINAL PAGE IS POOR 


I CONTROL DATA I ENGINEERING 

j i 

ICORPORATION ! SPECIFICATION 


NO, i035A637 
DATE Oec. 1977 
PAGE 131 
REV.- 


R A D L 


3,o,L«2 (Coni’,} 

TABLE 3.6-22 DISPLAY REGISTER SELECT CODES (Cont.) 


Code 


4 


16 


(continued ) 


Reg ister (s ) 


Bits 


SECDEO Fault Address 

(Absolute physical bit address, significant 
to the half-word level) 


34-63 


The address of the first SECDEO error is 
retained in this register. 

The SECDEO Fault Address is released by the 
Clear Single Error Condition Signal from the 
MCU. 


5 


6 


Bounds Hit Address 

(Absolute physical bit address, right 
Justified) 

The address of the first bounds hit is 
retained in this register. The bounds hit 
address is released by the Clear Fault 
Condition Signal from the MCU. The bounds 
checking is performed on half-word boundaries 
only. 

Counter Al 
Counter A2 


0-31 


0-15 

16-31 


Counter Bl 
Counter 82 

If bit 8 of channel BTA6 in the MCU Is a "0“, 
both counters will be cleared after the read 
signal is received and after both counters are 
transferred into the display register. If 
bit 8 is a "i”, the counters will not be 
cl eared. 


32-47 

48-63 


1 I Jo ensure proper initialization of the counters, 

1 5 the count I ines must be made zero prior to the 

I I new count selection. 

( ( . . 
i I 

( 1 
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3«6»4»3 Logic Fault Monitoring 

There are two types of logic faults detected in the 
computer. They are memory SECDED and MIC memory 
parity. When a logic fault is detected, the computer 
stops between instructions. The types of fault may 
be sensed on channel ATS8. <See Section 3.6.l)» 

After sensing the logic fault, the MCU must clear the 
fault v/ia bit 7 of channel BTAi. The MCU must 
determine the appropriate response to the fault 
and has the option of restarting the CPU by setting 
bit 3 of channel BTAl. 

3*7 Swap Unit 

Figure 3»7~1 gives an overall block diagram of the 
Swap Unit. This unit performs swapping operations 
for data from the Register File to Main 
Memory, from Main Memory to the Register File during 
exchange operations and register file swap 
operations <70 instruction). The Swap Unit also 
performs block swap bperations between Main Memory 
and the Backing Store. Finally, the Swap Unit 
provides the I/O interface to the Backing Store. 
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BACKING 

STORE 




DATA FROM I/O UNIT 
DATA FROM MEMORY 

DATA TO SCALAR 
PROCESSOR 



Figure 3»7“1 


FMP Swap Unit 
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3.7«1 Data Hovement 

✓ 

The Swap Unit moves swap data between register files 
and memories at the rate of 128 bits oer minor cycle. 
Data transmitted to/from Main Memory is transferred 
in 512-bit (plus SECDEO) segments. Thus a set of 
assembly and disassembly buffers are provided to 
agglutinate or decompose data into/from 512-bit 
units from/to the transfer quantity of 126 bits. 

3.7.2 Error Checking 

SECDEO is carried on all trunks' (including the 
register file swap trunk) and in the data buffers. 
SECOEO originating in the Main Memory is composed of 
seven bits for every 32 data bits. SECOEO is 
carried this way (seven bits for every 32 bits) 
throughout the FMP with the exception of the Backing 
Store. SECOEO within the Backing Store consists of 11. 
error code bits for each 512 data bits? the 
conversion between seven bits for every 32 and 11 
bits- for every 512 must be accomplished at the 
Back in g ■ Store interface. ^ 

In the event that a single-bit error is detected by 
the Swap Unit» a flag is sent to the MCU and the 
counters Pointing to the data plus syndrome bits are 
locked UP for sampling by the MCU. Note that in 
such cases the first error only will be locked up in 
the MCU interface. 

When a double-bit error is detected, the Swap Unit 
is halted immediately, stop flags are sent to the 
Scalar, Map and Vector Units, and the MCU is alerted. 
Location and nature of the error are locked uo as in 
the case of the single-bit error. 
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3.7.3 Addressing 

Ai ! transfers involving the Backing Store occur in 
blocks of 32»768 64~bit words. Ail exchanges between 
the Register File and Main Memory occur in 
multiples of 128-bit quarter-swordsi beginning, on a 
quarter-sword boundary. When a reference to a 
particular Backing Store block is made, data 
transmission from that block begins as soon as the 
particular block is selected, rather than waiting 
for the first word of the block to appear at the 
cutout or input ports of the Backing Store. This is 
necessary to reduce the latency time inherent in 
b f ock- organ ized CCD memory systems. With a block 
length of 32*768 words the latency. time to wait for 
the first word can be as long as .3.2 milliseconds. 
This can be reduced to near zero by beginning 
transfers immediately. 

The ability to begin Backing Store transfers 
immediately requires address control information to 
be exchanged between the Backing Store and the Swap 
Unit. In particular, the current address being read 
by the Backing Store must be sent to the Swap Unit 
so that Main Memory write addresses can be 
adjusted to match the starting word location. The 
same is true for the Read from Memory, Write to 
Backing Store. 

3.7.4 Address Queue and Backing Store Map 

The Swap Unit is capable of four different swap 
instructions, one in process and three waiting to be 
executed. This permits the object code in the 
Scalar Processor to release a series of swap 
instructions local to a segment of computation and go 
on to something else. As addresses and swap control 
information arrive at the Swap Unit, the 
corresponding blocks in the Backing Store are set 
busy in the backing store map. This prevents 
accidental overlap of swap operations. A similar 
busy is set for Main Memory by the Memory 
Interchange to prevent erroneous conflict sit-uations 
to arise. 


(continued) 
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3.7.4 (Cont.) 

I/O operations are tunneled through the Swap Unit* 
by checking the busy tnap. I/O operations are 
permitted to occur only with busy blocks (which have 
been set busy by the monitor). At the completion of 
an I/O operation, the I/O processor can clear the 
busy bit or instruct the monitor to clear it. 

In the event that two swao requests are made to the 
same block of Backing Store* such that the second 
request encounters a busy block, the request will be 
rejected and a data flag bit set to indicate that an 
error condition has arisen. The programmer may 
choose to sample the data flag bit or to enable an 
automatic data flag branch to an error handling 
rout ine. 

3*7.5 Control Signals 

(To be defined later) 

3.7.6 Microcode Control Terms 

(To be defined later) 

3.7.7 Interface Signals 
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3»8 Backing Store 

The Backing Storage subsystem provides a massivei 
on-line memory with fast access and high block 
transfer bandwidths. The memory is engineered into 
32-m i I 1 1 on-wor d modules, so the minimum 
configuration is one 32-million-word cabinet. The 
maximum configuration is governed by available soace, 
but with the 65K chips now available in CCD (charge 
coupled devices) the practical maximum appears to be 
256-million words. The allowable address space 
permits an expansion up to i-billion words of 
Backing Storage as technology developments make such 
volumes feasible. 

3*8.1 Data Movement 

Data is read/stored from/into the CCD memory on 
512-bit paths. Every 400 nanoseconds 512 bits are 
read in parallel from a selected rank of memory 
chips. The data is latched into 512-bit holding 
registers and disassembled into 128 -bit segments for 
transmission to the Swap Unit every tenth minor- 
cycle. Writing of the Backing Store proceeds in 
reverse order with 128 -bit segments being transmitted 
and assembled every tenth minor cycle, and the 
resulting 512 bits being written to the Backing Store 
every a- 00 nanoseconds. 
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3.8.2 


3 • 8 . 3 


3.8.4 


3.8.5 


3.9 


Error ChecKing 

The Swap Unit transmits and receives 128 bits? d!us 7 
bits of SECDED for each 32 data bitsj between the 
Baching Store and itself while 11 bits of SECDED are 
stored and retrieved with each 512 bits of data in 
the Backing Store, The conversion from 7 SECDED 
bits per 32 data bits to n SECDED bits per 512 data 
bits (and vice versa) Is done in the Backing Store 
Unit at the assemb I y /d isassemb 1 y interface. While 
the new SECDED code is generated^ previous SECDED is 
checked with appropriate error correction and/or 
flagging taking place. 

Addressing 

A read or write operation is initiated by the Swap 
Unit sending a basic block address and read or write 
signal to the Backing Store. In return the Backing _ 
Store Unit transmits the next word address available 
for readng and writing. The Swap Unit then adjusts 
its memory addresses so that data transfer can begin 
immedi ate 1 y , 

Contro I Si gna 1 s 

(To be defined later) 

Interface Signals 

(To be defined later) 

Timing Information 

The FMP Is in preliminary design phase so only the 
most preliminary timing estimates are available, 

AH estimates are given in CPU minor clock cycles. 

The period of this clock cycle is predicted to be 10 
nanoseconds »' but with extant technology can be no 
worse then 15 nanoseconds. 
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3.9.1 Scalar Processor Timing 

The table in Section 3.9.1. 2 is designed to provide 
scalar timing data tor the instruction sequences in 
FMP. AM timing data is expressed in minor cycles. 

3. 9. 1.1 


Mu 1 1 i.* operand instructions are typical ly expressed as 
overhead + (number of cycles per operand) (number of 
operands) . 

Scalar instructions are expressed as described below. 

The ISSUE portion of the table gives the minimum 
number of minor cycles between the issue if the 
specific Instruction listed in the left column and 
the issue of the next instruction in the seauence. 
Various operand or memory conflicts (as discussed 
later) can cause additional delay beyond this 
minimum time. 

The Issue portion of the tables is sub-divided when 
appropriate into three categories as defined below: 

NB — No Branch 
IS8 — In Stack Branch 

OSB — Out of Stack Branch to first quarter- 
sword. This time must be increased by 
2 or 3 if the Branch address is in 
the 2nd, 3rd or 4th quarter-sword 
respect i ve I y . 

The non-branch instructions use the entry under NB 
or No Branch. Example i illustrates a simple, no 
conflict. Branch sequence. 
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A ) 60 - - _ Register designators which 

are not pert inent to the 
• examp ie are represented by 

3 ” • 


8) 25 - - - Branch condition not met. 

C) 65 

03 25 - - - Branch condition met » 

branch out of stack to 
instruction E in 3rd 
quart er-swor d.« 

S equence T i ming 

.Instr. A issues at minor cycle 0 

Instr. B issues at minor cycle i 

Instr. C issues at minor cycle 12 

Instr. 0 issues at minor cycle 13 

Instr. E issues at minor cycle 42" 

The RESULT AVAILABLE portion of the table contains 
information necessary to time instruction sequences 
with operand dependencies. The first column, SS or 
shortstop, contains entries for those instructions 
which use the Scalar Floating-Point Unit. These are 
the instructions which may use the shortstop feature 
to provide an input operand. This entry is the 
number of minor cycles after issue that the result 
operand wil 1 be available at. the shortstop for use 
with a following instruction. 

If instruction A issues at minor cycle X, any 
following instruction, 8, needing the result of A 
must issue no later than minor cycle X+SS to utilize 
the shortstoo. A f 1 oat ing-oo i nt instruction needing 
the result of A, can be issued before X+SS and wait 
at the input of Floating Point for the shortstopped 
result of A. This allows other non- f I oat ing-po int 


( continued) 
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3. 9. 1.1 (Cont.) 

instructions to issue. The resulting time of an 
instruction that Issues and waits Tor shortstop 
will be as if it had issued at the idea! time to 
match shortstop. A subsequent instruction requiring 
access to Floating-Point will not issue any earlier 
than CX+SSI + 1. 

If instruction B issues later than cycle X+SS» thus 
missing the shortstop; instruction 8 must wait until 
at least X+RF. At this time the desired operand 
will be available from the Register File. Examoie 2 
illustrates operations using shortstop. 


Examo 1 e ? 

Instr . 


R 

-S__ 

T Comments 

A) 

60 




12 

B) 

60 


- 

- 

- 

C) 

60 


- 

- 

- 

D> 

60 


- 

- 

- 

E) 

6 0 



- 

- 

F) 

60 


12 

- 

14 

G) 

60 


lA 

14 

- 

H) 

7F 


- 

- 

- 

I) 

60 


- 

- 

15 

J) 

60 


14 

- 

16 

K) 

60 


16 

15 

- 

L) 

60 


— 

— 

— 

Sequence 

Timing 





Instr. A 

issues 

a t 

0 



Instr. 8 

issues 

at 

1 



Instr. C 

issues 

at 

2 



Instr. 0 

issues 

at 

3 



Instr. E 

issues 

at 

4 



Instr. F 

i ssues 

at 

5 


Thus exactly matching 
shortst 00 

Instr. G 

issues 

at 

6 


Issues but must wait 
at the inout of 
Floating Point for the 
result of instruction 
F to be avai i ab 1 e 
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3.9 .1.1 


(Cont. ) 


Instr. H issues at 7 
Instr. T issues at 11 

Ins.tr. J issues at 13 

Instr. K issues at 14 
Instr. L issues at i9 


— Cannot issue until 

instruction G catches 
shortstoD and 
oroceeds. Thus 3 
minor cycles not used 
-- Missed result of 
instruction F at 
shortstop thus 
waiting until operand 
is available from 
Regi ste r File 
-- Issues and waits at 
input to Floating 
Point 

-- Instruction K is 

treated as if issued 
at 18 and the L at 19 


The last column under RESULT AVAILABLE fHEM) 
contains entries for those scalar instructions {13» 
32» 5F , 7 F)' which store a result into Main Memory. 
The time listed is the minimum time until the 
operand is in memory and available for use. The 
time may also be .increased by 4 minor cycles if the 
desired memory bank is busy. 


The UNIT BUSY portion of the table concerns 
instructions issued to either the Divide/Convert 
(D/C) Unit or the Load/Store Unit (L/S). The Divide/ 
Convert Unit executes the 10» 11. 4C? 4F . 53. 6C» &F 
and 73 instructions. This unit is the only portion 
of Scalar Floating-Point which is not completely 
pipelined; thus the appropriate unit busy time 
listed in the table must elapse before a third 
instruction can be issued to the Divide/Convert Unit. 
Floating-Point instructions other than these eight 
may be issued to Floating-Point while the 
Di vi de /C on ver t Unit is busy. A second instruction 
from the set of eight can be issued, but will be held 
in front of the Floating-Point Unit and issuing of 
non-floating-point instructions will continue. 


(cont inued) 
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3 . 9. 1.1 <Cont.) 

The Load/Store Unit executes the 12 » 13? 32t 5E» SF? 
7 E and 7 F instructions* There are six address, 
registers in the Load/Store Unit which enable 
requests to be stacked and executed in the proper 
order. The 12» 5 E and 7 E instructions each require 
one register and can be executed (in the absence of 
memory conflicts) at the rate of one load per minor 
•cycle. The 5F and 7F Instructions each require two 
address registers and can be executed at one store 
per two minor cycles. The 13 and 32 instructions 
each require two address registers and can be 
executed at one per 14 and 15 minor cycles, 
respective I y. 

The Load/Store Unit is then capable of streaming 
Load/Store instructions (other than the 13 and 32) 
at one minor cycle per load and two minor cycles per 
store assuming no Memory or Register File conflicts. 
For example, a stream of N loads will execute in N -s- 
14 minor cycles from the issue of the first load 
until the operand from the last load is available in 
the Register File. A stream of N stores will 
execute In ?N + IN minor cycles from issue of the 

! 3 

first store until issue of the last store. 

Instr. R S T Com ments 

A) 60 - - - 

B) 7 E - - _ 

- C) 13 _ - - 

0) 13 - - - 

E> 60 _ _ _ 

F) 7E _ - _ 

G) 7 E _ _ - 

H) 7 E _ - _ 

1) 13 - _ _ 


(continued) 



ICONTROL DATA i ENGINEERING NO. 1035A637 

} . 

ICORPORATION \ SPECIFICATION PAGEf 144 



R A D L 

3. 9. 1.1 (Gont. ) 


Seque nce Timing 


I ns tr . 

A 

issues 

at 

0 





Instr. 

8 

Issues 

at 

1 





I ns tr . 

C 

issues 

at 

2 





Instr. 

0 

i ssues 

at 

4 





Instr . 

E 

issues 

at 

6 





Instr , 

F 

issues 

at 

7 





Instr . 

G 

issues 

at 

8 



‘ 


Instr. 

H 

issues 

. at 

19 

Instr.' H 

must 

wait 

for 







address register 







to become 

free 







f rom 

Instr 

« c • 

I nstr . 

I 

issues 

at 

35 

Instr. I 

must. 

, wait 

for 


address register. 


There are three additional Oper and Dependencies which 
must be considered. 

1. Source operand conflict -- an instruction, 
requiring the' result of a previous instruction 
as an inout operand waits until the operand 
becomes available, 

2. Output operand conflict -- an instruction output 
to the same Register File location as a 
previously Issued* but slower instruction, waits 
until the previous instruction stores its result 
in the Register File. 

3. Register File Write conflict — an instruction 
cannot issue if its result arrives at the 
Register File at the same minor cycle as the 
result o.f a previously issued but slower 
instruction. 

Table 3.9-1 pertains to instructions having greater 
than 1 minor cycle issue time. 

The first column lists the appropriate instructions. 
The second column indicates the minor cycle of issue 
that a specific operand is required. The third 
column indicates the availability of shortstop for 
that specific operand. 


(cont inued) 
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TABLE 3.9-1. ORDER THAT DESIGNATORS ARE READ FOR MULTIPLE 
ISSUE INSTRUCTION AND IF THEY CAN CATCH SHORTSTOP OF A 
PREVIOUS INSTRUCTION 



INSTRUCTION 

) 

DESCRIPTION 

1 

SHORTSTOP 



13 1st 

1 

RESET 

i 

No 



2nd 

1 

1 

T 

1 

t 

No 



20-27 1st 

J 

1 

T 

1 

1 

No 



2nd 

i 

RES 

J 

Yes 



2F 1st 

1 

1 

S 

I 

No 



2nd 

1 

T 

1 

t 

No 



3rd 

I 

T 

! 

No 



315,35 1st 

1- 

SET 

1 

No 



2nd 

1 

R 

t 

1 

No 



3rd 

1 

R 

! 

No 



32 1st 

t 

1 

S 

I 

No 



2nd 

1 

1 

T 

) 

t 

No 



36 ' 1st 

t 

1 

SET 

4 

1 

No 



2nd 

J 

R 

1 

No 



3rd 

4 

1 

R 

1 

1 

No 



5F 1st 

1 

1 

RES 

1 

1 

No 



2nd 

1 

T 

1 

1 

No 



6D 1st 

i 

RES 

1 

Yes ?RES) 



2nd 

1 

T 

t 

1 

No 



7F 1st 

4 

i 

RESET 

I 

1 

No 



2nd 

1 

T 

i 

No 



80-85. XOOX-X 

1st { 

BEY ' 

1 

No 




2ndl 

XEA 

) 

1 

No 




3rdl 

Z 

1 

No 




£fthi 

XEAEC 

1 

1 

No 



B0-B5.X01X-X 

1st 1 

BEY 

1 

1 

No 




2nd! 

XEAEC 

i 

J 

Yes (X+A) 




3rd 1 

Z 

1 

Yes 



80-B5.X10X-X 

1st 1 

BEY 

t 

1 

No 




2nd! 

XEA 

I 

1 

Yes 



80-B5.XllX-X 

istl 

BEY 

1 

t 

No 




2nd! 

XEA 

t 

i 

Yes 


[ 

BS 1 st 

I 

t 

R 

I 

No 
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3»9.i.i (Cont. ) 

Fxamol a u 



Instr, 

R_- 

S 

_1 

Comments 


A) 

60 



12 



8) 

35 

10 

- 

12 

Specifies an out of stack 






branch to Instruction 
the 2nd quarter sword 

C in 

C) 

60 

- 

- 

35 



D) 

80 

40 

35 

- 

Specifies an in stack 

branch 






to Instruction E 


E) 

60 

— 

- 

- 




Senuencp Timin g 


Instr. A 

Issues 

at 

0 


Instr, 8 

issues 

at 

8 

8 must wait for Result from A 
to be stored into the Register 
Fi 1 e 

Instr. C 

issues 

at 

32 


Instr, 0 

issues 

at 

36 

Result from Instruct ion. C 
available from Shortstoo at 
time 37 allows issue at 36 

Instr. E 

issues 

at 

48 
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3. 9. 1.2 Basic Instruction Timing 

TABLE 3.9-2 SCALAR INSTRUCTION TINES 



^ MUST ADD 5 MC FOR REGISTER RELEASE 

(cont inued) 
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3 . 9 • 1 . 2 (Cont. } 

TABLE 3-9-2 SCALAR INSTRUCTION TIMES (Cont.) 


Issue 


Resu It Avail, 


lUnit Busy 1 


I Inst ructions 

21 
28 
2C 
20 
2E 
2F 

30 
31 ’ 


MB ! IS8 I OSB J S.S, 


1 

1 

1 

1 

i 

7 

1 

7 


^ ^ i 


^ _ t 


^ ^ I 


— I 




23. i 


3 

3 

3 

3 

3 


23 


R.F. } MEM 1 L/S 1 D/C 


6 

6 

6 

6 

6 

7 

5 

7 


^ MUST ADO 5 MC FOR REGISTER RELEASE 


(continued) 
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TABLE 3.9-2 SCALAR INSTRUCTION TINES (Cont.) 






Issue 



I 

Resu i t 

Ava i I 

« 

i Un i t 

Busy 

« 

1 

1 

4 

1 Instructions 

j 

1 NS 

i 

IS3- 

i 

OSB 

I 

S • S p 

I R 

.F, 1 

MEM 

1 ’ 
1 

L/S 

1 D/C 

r 



T 




T 


*l~ 



! 


1 

1 

i 

i 





4 

1 

1 


1 

4 


i 

1 

1 

1 

1 


G- 

1 

-bits 2S.3 
} 

15^)10 1 

4 

1 

1 

32. OQX-X 

! 2 

1 

1 


1 

1 

4 


t 

1 

f 


i 

1 

1 

t 

<24 

4 

! 

1 

\ 



1 

* 


i 


} 


! 

J 

{ 

1 


) t 

1 




i 


1 


1 

1 


i 

i 

] 

) 


1 

1 



1 

1 

1 

32. Ol-X 


1 

9 

1 

24 

1 

i 

— 

J 

1 

4 

<<24 

t 

1 

15»)>1 0 

« 

I 




1 

t 


1 

f 


1 

4 


i 

\ 

4 

1 

1 

< 

4 

< 

4 


) J 

I 

1 

J 

1 

32. IX-X 

1 20 

1 

1 

1 

21 

4 

1 

36 

i 

1 

— — 

J 

1 

J 

I 

— i 

<24 

4 

1 

15^) { 1 1 

1 



1 


1 


1 


t 

1 

( 

i 


) 1 

1 




< 

1 


1 


t 

t 


J 

1 

? 


1 

i 


1 


1 

33. XXXXX.0XX 


1 


1 


1 


1 

I 

1 


i 


! 

) 

1 



1 


t 

t 


1 

1 


1 

1 

1 


1 

1 


I 


1 

33. XXXXXlXX 


1 


1 


1 


1 

1 


i 

1 


J 

} 




i 


{ 


1 


1 

« 

i 


\ 


1 


1 

34 

1 1 

1 

-- 

4 

1 

— 

i 

3 

1 

6 i 

— 

1 


! 





1 

4 


1 


1 

1 


t 

1 

1 


1 


1 



35 

i 7 

\ 

1 

8 

1 

4 

23 

1 

4 

— 

1 

1 

7 5 
I 

5 ! 

— — 

1 

1 

1 


1 

1 



36iR=T,S=0 

{ 5 

t 

1 

^ — 

I 

I 

— 

1 

t 

— — 

t 

1 

^ - 

! 


1 


1 



i 


1 


! 


1 

T 

t 


1 


1 


1 

36>R=T,S?«0 


! 

9 

r 

24 

! 

— 

1 

i 


1 


1 

} 

! 



t 

1 


i 


i 


! 

1 

4 


1 

t 


1 

1 

1 

3 6? R5^ T 


1 

8 

1 

23 

1 

— 

1 

5 i 

-- 

1 


J 

J 

1 



1 


} 


i 


1 

I 

1 

1 


1 


I 

1 

1 

f 

37 

i 32 

I 

t 

1 


1 

1 

4 


1 

1 

1 

-- 

1 

_ ^ 1 
1 

1 


i 

1 

f 

1 


\ 

\ 


! 

38 

1 1 

1 

1 

— 

I 

1 

— 

4 

! 

1 

i 

1 

1 

4 ! 

- — 


4 

1 

} 

» 



{ 


1 


1 

J 


1 

I 


1 


1 

t 

i 

39 

i 3Q 

1 


} 



— 

1 

_ _ t 

4 


1 


i 

! 

! 



1 


J 


I 


\ 

1 


1 


1 

i 

1 

1 



1 


1 

4 


4 

4 


\ 

1 


1 


1 

i 

\ 

i 

3A 

1 20 

1 

1 

1 


1 

1 


) 

f 

f 

— 

1 

1 

_ _ 1 
i 

i 

1 


1 

S 

1 


I 

t 

! 

} 

1 

1 

3B- 

! 26 

1 

1 

j 

- - 

I 


\ 

4 


1 

1 

— 1 
! 


1 

1 

1 

1 

! 


1 

1 

4 

f 

t 

t 

♦ 

3C 

1 1 

1 

4 

-- 

J 

— 

4 

1 

5 

1 

1 

8 ! 

1 

-- 


J 

4 

! 

} 

1 

I 

30 

i 1 

! 

1 

-- 

i 

J 

— 

1 

5 

1 

4 

1 

I 

8 1 
1 

- - 

1 

1 


1 

4 

4 

! 

1 


^ MUST ADD 5 MC FOR REGISTER RELEASE 


|fcS>RODUCIBILnY OF THE 
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TABLE 3.9-2 SCALAR INSTRUCTION TIMES (Cent.) 



[ 

Issue 

1 

Resu It Avail. 


lUnit Busy ,1 

1 1 

'Instructions! NB 

! 

ISB I 

OSB 1 

S.S. 1 

R.F. 

i 

MEM 

i L/S 1 

D/C i 

! 

i 

T 

i 

! 

1 


1 


1 1 


J 

! 



1 

1 




i S 

1 

1 

{ 54 

1 

I 1 
1 

} 


t 

5 1 
• 

8 

1 

— 

! I 

j 1 

1 

t 

i 55 

1 1 

1 

— - 


5 1 

8 

1 


j 1 

1 

1 

t 



S 

1 




\ 1 

t 

4 

i 










1 58 

l 1 

1 

— 

— I 

1 1 

4 

1 

— 

i 1 

1 

1 

1 

! 


{ 

1 

1 




! 1 

i 

1 59 

! 1 

! 

— 

-- { 

5 1 

8 


— 

1 i 

I 

1 

1 

1 


J 

t 

1 

1 




1 1 

! 


1 

! 


J 

! 


I 


i 1 

I 

? 5 A 

1 

! 1 
I 


— 

I 

3 J 

f 

6 

1 

“ — 

1 i 

T j 

{ 

j 

i 5E 

1 1 

! 

— 


3 

6 


— 

1 1 

1 

1 

1 

I 



! 




i ! 

1 

1 

1 5C 

1 i 

1 


— 1 

5 i 

8 

1 

— 

1 1 

1 

1 

1 

1 

t 

4 


1 

1 




I 1 

1 

2 

1 50 
1 

1 i 

1 

1 

I 

1 

1 


j 

5 1 
1 

8 

1 

» 

" 

i 1 

1 ! 

1 

) 

i 

i 

1 5E 

1 

i 1 

I 

! 

* 

( 

1 

15 

4 

1 

1 

1 


1 1 
1 1^ 1 

t 

« 

1 

! 

1 

1 


1 

I 


1 


i 1 

i 

i 5F 

i 2 

1 

-- 

— i 

1 

-- 

1 

10 

j 2"^ ! 

< 

2 

1 

1 

1 

1 


I 

1 


1 


i { 

1 

t 

t 

! 


{ 

f 

i 


i 

1 


1 1 

i 

1 

1 60 
i 

i i 

t 

I 

1 

— 

1 

5 f 

8 

1 

i 

— — 

1 t 

1 i 

1 

4 

1 

I 51 

1 1 

1 

— 


5 1. 

8 

t 

4 

— 

I 1 

! 

1 

1 

) 

1 

I 


1 

1 


1 

4 


1 I 

1 

1 

! 62 

1 1 

1 

-- 

-- ! 

5 1 

8 

1 

4 

— 

1 J 

1 

t 

1 

1 

1 


1 

1 


1 


1 i 

1 

2 

1 63 

1 

1 1 
1 

{ 

1 

— 

1 

1 I 

J 

4 

1 

1 

f 

— — 

1 1 

! 

1 

i 

1 

1 

! 

1 

1 

t 


1 

! 

1 

! 


1 


1 1 

1 

1 64 

1 

1 i 
1 

« 

i 

— 

1 

5 * 

\ 

8 

t 

1 

! 


1 1 

■ 1 

1 

1 

1 

1 65 

1 

1 i 
1 

1 

1 

__ 

t 

5 1 

1 

8 

1 

t 

— 

I 1 

> > 

1 

t 

i 

i 66 

1 1 

1 

— 


5 I 

8 

1 

1 

— 

! 1 

i 

$ 

1 

1 

! 


1 

1 

1 

1 


1 

1 


1 1 

1 

4 

{ 67 

1 1 

1 


— 1 

1 i 

4 

1 


! 1 

1 
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\ 


Issue 

1 

1 

Resu { t Avail. 

1 Un-i t 
- 1 

Busy 

1 

■I 

>* < 

^ 1 

• 1 


Instructions S NB 

\ 

IS3 1 

OSB 1 

S.S. \ 

R.F. ! MEM 

1 — — 

S L/S 

! D/C 



...... 

T 


* 

i 

{ 

1 

1 

1 



1 

1 

i 

i 

1 

1 

1 

< 

1 

1 

1 

1 

1 


68 

1 1 

1 

1 

f 

r 

1 

5 1 
1 

8 1 
) 

1 

I 

1 

1 

4 

4 

1 


65 

1 1 

1 



5 5 

8 1 

1 

1 

f 




1 

1 

i 

1 

1 

1 

1 

1 

t 

1 

t 

1 

1 


6B 

1 1 

1 

-- 1 


5 ! 

8 1 

1 

1 

1 



1 

1 

1 

1 

J 


1 

1 

1 

1 


6 C 

1 1 

i 

— j 


54 { 

57 1 

I 

125 

I 



1 

1 

1 

1 

1 

J 

1 

1 

1 

1 




1 

1 

1 

I 

1 

5 

i 

1 

1 


60 

5 2 

1 

-- 1 

— J 

4 i 

7 > 

1 

1 

1 

1 



{ 

1 

1 

1 

1 


1 

1 

1 


6E 

5 i 

j 

-- } 

— i 

3 { 

6 J 

! 

1 

t 

1 



I 



t 

1 

1 

1 

1 

1 

1 


6F 

J 1 

i 

— i 


5^ i 

5 7 1 

1 

125 

1 



{ 

j 

i 

f 

1 

\ 

1 

1 




% 

1 

1 

1 

1 



1 

1 

1 



70 

1 1 

i 

— i 

— 1 

5 I 

8 I 

1 

1 

1 



I 

1 

1 

I 

t 

1 

1 

1 

1 

1 

1 

1 


71 

I 1 

1 

1 -- 1 

— 1 

5 1 

8 ! 

1 

1 

4 

4 



1 

1 

\ 

f 

1 

1 

1 

I 

1 

1 

1 

4 


72 

1 1 
1 

1 

1 

1 

1 

5 \ 

I 

8 J 

f 

\ 

1 

1 

1 

1 

1 


73 

} 1 
t 

1 

1 

< 

1 

} 

53 1 
I 

56 1 

1 

1 

1 

1 

125 

4 

t 

4 

1 

4 

t 


71 * 

J 

i 1 

1 

1 

1 

1 

1 

5 t 

1 

8 ! 

1 

I 

1 

4 

t 

t 

4 

1 

1 



1 

! 

j 

1 

1 

J 

! 

1 

1 

1 

1 

1 


75 

» 1 

i 

-- ! 

— 1 

5 1 

8 1 

1 

1 

4 

1 



I 

1 

1 

1 

1 

1 

1 

1 

1 


76 

1 1 

1 

1 


5- 1 

8 1 

1 

1 

t 

I 



t 

t 

1 

1 

} 

1 

t 

1 

1 

9 

t 


77 

1 1 

1 

-- ! 

— 1 

5. i 

8 1 

> 

t 

1 

4 

1 



i 

1 

1 

1 

1 

1 

1 

1 

1 



1 

1 

1 

1 

4 

1 

1 

1 

1 


J 

1 

4 


78 

1 1 
j 

i 

1 

1 

1 

1 i 
] 

4 1 
1 

i 

1 

I 

1 

f 

1 

1 


75 

} 1 

1 


i 

1 

5 ! 

8 1 

1 

1 

4 

1 

1 

i 

1 



I 

1 

} 

1 

1 

! 

1 

1 

4 

1 


71 ^ 

{ 1 

1 

t 

-- J 

— i 

3 1 

6 { 

1 

1 • 

1 
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1 


Issue 

1 

1 

Resu It Avail 

• 

I Un i t 

Busy 

Instructions 1 

NB I 

IS3 1 

OSB 1 

S.S. 

1 R.F. 

MEM 

1 L/S 

1 0/C 

7B 


1 i 

r 

1 

-- 1 
} 

— 1 

1 

1 

* 

I 

3 

1 

1 

} 

I 6' 

1 

— - 

t 

» 

1 

1 

I 

I 

1 

1 

i 

j 

7C 


1 1 

J 

3 

! 6 I 

I 


1 

1 

1 

7E 


1 ! 

— 1 
1 

-- 1 
1 

J 

— 

1 15 

\ 


i 1^ 

f 

1 

7F 


2 1 

1 

— 

1 

10 

1 2^ 
I 

1 

80.XQ0 

x-x 

8 J 

1 

1 

9 1 

I 

24 1 

1 

— 

1 

1 8 
i 

-- 

1 

1 

! 

1 

1 

1 

1 

BO.XOl 

x-x 

3 i 

1 

— 1 
1 

12 1 
1 

1 

5 

1 5+8^^ 

-- 

1 

1 

I 

i 

SO.XlO 

x-x 

11 I 

27 I 

t 

— 



I 

I 

J 

t 

BQ .Xll 

X 

1 

X 

2 1 

1 

f 

1 

1 

6 

I 9 

1 

-- 

1 

1 

1 

« 

1 

I 

Bl.XQO 

1 

1 

X-X [ 

1 

8 1 

1 

9 I 
1 

— J 
1 

12 1 

1 

1 

24 1 

1 

— 

I 

1 8 
1 

-- 

1 

1 

1 

5 

i 

1 

Bl.XOl 

X 

1 

X 

3 ! 

♦ 

5 

1 5+8 

1 


1 

t 

I 

1 

Bl . XO 0 

x-x 

11 I 

27 1 

I 

-- 

1 

-- 

1 

1 

1 

1 

Bl.Xll 

X-X 

2 i 

-- 1 

1 

1 

1 

i 

6 

1 9 


1 

] 

1 

1 

82 « XD 0 

X-X 

S 1 

1 

9 1 
J 

-- 1 
1 

24 1 
1 

— 

1 

! 8 


5 

1 

1 

1 

{ 

I 

S2.X01 

X 

i 

X 

3 1 


5 

1 5 + 8^^ 
i 

-- 

I 

I 

1 

1 

BZ.XlO 

x-x 

11 i 

12 1 
1 

27 ! 
1 

— 

1 

__ 

I 

1 

f 

1 

B2.X11 

X-X 

2 1 

-- 1 


6 

1 9 


1 

1 


I I ) . I I 


^ MUST ADD 5 MC FOR REGISTER RELEASE. 

^^Output to be stored in Register C is available at 5 cycles and Y 
at 8 cycles. Y may be used from the Shortstop at time 5. C can 
not be shortstopped. 
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TABLE 3.9-2 SCALAR INSTRUCTION TIMES {Cont.) 


I Issue 


Result Avail, ■ JUnit Busy 


Instructions I 

NB I 

IS8 I 

OSB I 

s.s. 

R.F. 

1 

MEM i 

L/S 1 D/C 



1 



1 

■ 


T 

3 

t 

\ 

1 

1 

B3 , XO 0 

X 

1 

X 

1 

8 1 

9 1 

24 1 

f 

— - 

8 

1 

1 

\ 

f 

1 

1 

1 

1 

I 

f 

83.X01 

X-X 

1 

f 

3 J 

-- I 

t 

5 

5+8^^ 

I 

^ . « 
t 

1 

33 , XlO 

X-X 


ii ! 

12 1 

27 1 

I 

— 


1 

1 

1 

1 

B3 .Xll 

X-X 

1 

i 

j 

2 1 

— • i 

1 

'6 

9 


— 1 
1 

1 

I 

B4.X00 

X-X 

1 

1 

1 

t 

8 1 

9 1 

1 

1 

24 I 

1 

— 

8 

j 

1 

1 

-- 1 
f 

1 

1 

1 

i 

B4,xai 

X-X 

1 

1 

I 

j 

3 1 

-- ! 

1 

5 

5+8-^^ 

i 

— 1 
I 

1 

84.X10 

X-X 

1 

• 

11 1 

12 1 

27 J 
1 

— 


1 

1 

1 

— 1 
1 

1 

t 

B4.X11 

X-X 

1 

2 1 

— J 

1 

6 

9 

J 

{ 

i 

1 

- 


1 

J 



1 

1 




1 

} 

1 

1 

B5 , xtro 

X-X 

I 

8 ! 

9 i 

24 5 

— 

8 


-- 1 

1 



1 



! 




1 

1 

B5.X01 

X 

X 

1 

1 

3 I 

• - ] 

J 

5 

5 + S»^ 

1 

« 

-- I 

1 

I 

! 

85.X10 

X-X 

1 

$ 

11 I 

12 1 

27 1 

1 

— 

— 

t 

« 

I 

! 

I 

1 

1 

! 

85, Xll 

X-X 

1 

i 

2 1 

1 

1 

1 

6 

9 

• 

-- 1 
1 

1 

J 

86 


I 

1 

1 

7 I 

8 1 

1 

23 1 



1 

t 

\ 

1 

1 

1 

> 

1 



} 



1 



1 

' 1 

1 

BE 


1 

1 

1 ! 

— — i 

1 

1 

4 

1 

! 

1 

I 

t 

1 

1 

1 

1 

8F 


1 

1 

f 

1 1 

-- ! 

! 

1 

4 

1 

! 

1 

1 

i 

f 

1 

CD 

- 

t 

1 

1 

1 I 

— 1 

1 

1 

4 

1 

1 

i 

t 

I 

1 

1 

1 

i 

CE 


1 

1 1 

— { 


1 

4 

1 

1 

I 


^•*Output to be stored in Register C Is available at 5 cycles and Y 
at 8 cycles. Y may be used from the Shortstop at time 5, C can 
not be shortstopp'ed , 
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3.9.2 Vector Processor Timing 

AH vector processing times are stated in terms of 
overhead <the time required to start up a vector 
ooeration) and vector throughput in results per minor 
cycle. Total time for a vector operation is then 
stated as 0 + N/R where 0 is the overhead, R is t.he 
rate of results per minor cycle, and N=H times the 
ceiling of L/M? L is the vector length and M is 8 for 
64-bit operands or 16 for 32-bit operands. Ceiling 
is the APL operator which returns the maximum 
integer value of the argument. 

Vector overhead is variable depending on a number of 
conditions. Its component parts are! 

1* Issue time The Instruction Issue Unit 

requires a certain number of cycles to 
translate and scan over the vector 
instructions and the included 32-bit 
packets. 

2. Transmission of control Information from the 
Scalar Unit. 

3. Map Unit setup— The time required to form 

addresses and initiate the memory requests 
within the Map Unit. 

4. Transmission of memory request. 

5. Memory access time. 

6* Data transmission to Map Unit. 

7. Data transmission through Map Unit. 


{ cont inued) 
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3.9.2 (Cont. ) 

8. Data transmission to Vector Unit. 

9. Time through vector pipelines. 

\ 

10. Transmission to Map Unit. 

11. Data path through Map Unit. 

12. Transmission to memory. 

When two vector operations appear in consecutive 
instruct i'ons in the Issue Unit the issue time, 
transmission of control to the Map Unit, and part of 
the buffer setup time are overlapped. Thus the 
overhead in such cases can be reduced. 

Table 3.9-3 gives Vector Processor times for 
operations involving the Vector Units. See section 
3.9.3 for timing of vector operations executed 
within the Map Unit and section 3*9.4 for Swap Unit 
timing. 

TABLE 3.9-3 VECTOR PROCESSOR TIMES 


NO. 10354637 
DATE Dec. 1977 
PAGE 156 
REV. 


I 

1 

Function I Modifier 


Vector 

1 

1 Norma i 

t 

1 

{ 1 

} 

1 

1 1 

1 

1 1 

1 

1 1 

1 

1 ( 

1 

{ V 


Source 


Oesti- I 
nation I 


1 


Memory 

Memory 

Buffer 

Buffer 


Memory ' 
t 

Buffer 1 
I 

Memory { 
I 

Buffer I 


Overhead 

Time’^ 

{ Sustained 
i Data 

{ Rate/Cycle 

28 + 2P 

1 

) 

1 512 bits 
{ 1 

20 + 2P 

1 1 

f I 

20 + 2P 

\ 1 

1 f 

12 + 2P 

{ 1 


Buffer 

Load 


1 

1 Broadcast 
1 1 

i 1 

I 1 


I 

V 


1 

1 None 
I 


Memory 

Memory 

Buffer 

Buffer 

Memory 


I 

Memory i 28 + 2P 
I 

Buffer I 20 + 2P 
1 

Memory I 20 + 2P 
{ 

Suffer 


1 


8uf fer 


12 + 2P 
14 + 2P 


512 bits 


^P = Packet Count in the instruction header. 
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3.9.3 Hap Unit Timing 

The Map Unit functions of MERGE? MASK? COMPRESS? 
SCATTER and GATHER operations are incorporated 
physically within the single Map Unit. These 
operations have only memory as a source of operands 
but all except SCATTER can deliver results to either 
memory or the vector buffers. Table 3.9-4 gives 
timing information for these Hap Unit functions. 
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TABLE 3.9-4 NAP UNIT TIMES 


Funct i on 


Mode 

I 

1 

1 

Modifier 

i Desti- 
i nation 

1 

1 

1 Overhead 
I Time^ 

1 Maximum 
! Data 

1 Rate/Cycle 

Word 

1 

1 

1 

None 

J Memory 

36 


2P 

li operand^^ 

I 1 

I i 

1 I 

1 1 

1 ! 

1 V 

i 

1512 bits^^^ 

i • 

1 

» 

1 

1 

None 

1 Buffer 

28 

+ 

2P 

1 

1 

t 

] 

1 

Stride 

1 Memory 

26 


2P 

V 

1 

i 

1 

dl 

1 

} 

I. 

Stride 

1 Buffer 

22 

+ 

2P 

Recor 

1 

None 

1 Memory 

36 

+ 

2P 

} 

f 

1 

None 

1 Buffer 

28 

+ 

2P 

1 < 

1 1 

i t 

1 

t 

1 

1 

} 

Stride 

f Memory 

26 

+ 

2P 

1 i 

1 t 

V 

] 

J 

Stride 

1 Buffer 

22 

+ 

2P 

I V 
1 

I I operand^’^ 

S 1 

Word 

1 

J 

I 

-None 

! Memory 

30 

+ 

2P 

Word 

1 

1 

df 

I 

Stride 

! Memory 

28 

+ 

2P 

1 V 

1 

1512 bits^*=^ 

1 t 

1 V 

1 

18 input 1 

1 operands^’'^ 1 

1 1 

Recor 

None 

i Memory 

30 

+ 

2P 

Recor d f 
1 

Stride 

1 Memory 

28 

+ 

2P 

N/A 

1 

1 

1 

I 

N/A 

\ 

1 

1 Memory 

26 

+ 

2P 

1 

t 

t 

i 

1 

1 

1 

1 

1 

} 

J 

1 Buffer 

22 

+ 

2P 

1 i 

18 input 
ioperands"^-^ 

1 • ! 
1512 bits 
1 t 1 

i 

f 

1 

1 Memory 

24 

+ 

2P 


1 

1 

I 

\ 

i Buffer 

20 


2P 

I V 1 

1 

1 

t 

j 

1 

1 

1 

1 

! Memory 

26 

+ 

2P 

I i 

1 8 output 1 
1 operands*’’^ J 
1 1 
18 output J 
loperands^* f 

V 

1 

1 

1 

V 

1 Buffer 

22 

+ 

2P 


GATHER 

i 

I 

t 

I 

I 


I 

! 

I 

1 

V 

SCATTER 

I 

I 

I 

i 

I 

1 

V 

COMPRESS 

J 

I 

I 

V 

MASK 

I 

V 

MERGE 

I 

I 

V 


^P = Packet count in the instruction header. 

•^^Either 32-bit or 64-bit word. 

''^■^■•’Rate assumes that swords are moved on sword boundaries? 
if not, maximum rate is 256 bits per cycle. 
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3.9.4 Swap Unit Timing 

Swap Unit startup consists of the issue cycle = 1-, 
transmission to the Swap Unit = 1, and Swap Unit 
setup = 5 cycles. Once begun? the swap operation 
moves data at the rate of 512 data bits every 400 
nanose cohds . 

4.0 quality' ASSURANCE PROVISIONS - Not Applicable 

5.0 PREPARATION FOR DELIVERY - Not Applicable 

6.0 NOTES 

6.1 Intercom 

COC "EMP has an intercom system which is utilized 
primarily for maintenance purposes. The system can 
be enabled by simply plugging the required number of 
headsets into. The desired intercom jacks. There are 
intercom jacks located In each section and in the 
MCU. Up to four headsets may be on-line at any time, 

6.2 System Start-up 

The Start-up sequence for the system is as foMowsS 

1. Bring up system power. 

2. Autoload MCU. 

3. Master clear the system from the CPU. 

This master clear: 

a. Initializes the CPU - clears all control 
flip-flops? data flags? interrupts and error 
f I ip-f I ops. 

b. Sets monitor mode in the CPU (Job Mode FF 
c I eared in step A) . 
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4 . Load microcode into the Stream Unit and the 
Scalar Unit from the NCU. 

5. The MCU sends an external flag to the I/O 
stations required on-line. The stations^ on 
receiving this flag» will autoload and enter an 
idle looD waiting for a channel flag from the 
CPU. An alternative approach Is to manual !y 
autoload each of the stations desired on-linei 

6. The MCU loads the operating system kernel into 
Main Memory, then interrupts the CPU. The 
CPU recognizes t he . interrupt and executes a 
partial exchange to start execution in monitor 
mode. This exchange is the same as a normal Job 
to monitor exchange except the contents of the 
Register File are not stored. Program execution 
starts at the address contained in monitor's 
register six Just as it does after a normal I/O 
interrupt . 
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PROGRAMMED DEVICE CONTROLLER DESCRIPTION 


A Programmed Device Controller (PDC) is a unit which adapts data channels or peripheral controllers to 
a serial data trunk. By means of the PDC and the serial trunk, a set of processors, or processors and 
peripherals, may be conveniently and efficiently interconnected. 

Figure C-1 illustrates a processor to processor data link construct using the serial tnmk for connectivity 
and PDC’s for adapting the CPU channels to the trunk. A CPU can present messages (and data) to 
the PDC, and can accept messages (and data) from the PDC, The PDC is an agent for inserting messages 
on the trunk and for selecting messages from the trunk, hi this case the PDC is neither the originator 
of an activity nor the recipient, but rather the means for conveying the bit stream defining the activity. 



Figure C-1. Processor To Processor Data Link 

Figure C-2 illustrates an interconnect of host processors and peripheral units. The function performed 
by the PDC for the host processors here is the same as that in the processor-to-processor data link; to 
act as the agent for message (and data) delivery, being neither the originator nor the recipient of a 
message. The PDC adapting a peripheral unit to a tnmk, on the otlier hand, is itself the originator or 
recipient of messages. As such, it has the function of message interpretation and execution as well as 
the message delivery function. 


C-1 











HOST 

PROCESSORS 


PERIPHERALS 
(AND CONTROLLERS} 



Figure C-2, Processor/Peripheral — Subsystem Network 

This type of PDC, having an additional function to satisfy, requires resources in addition to those of the 
processor-adapting PDC, namely execution time and memory; the design must be capable of such exten- 
sion. 
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The PDC consists of transmission control units, message and data buffers, a device interface, and a 
processor to manage these resources in a way which satisfies the PDC functions. 

1.0 TRANSMISSION CONTROL UNIT (TCU) 

The TCU decouples the message transmission protocol from the functional definition of the message. The 
message transmission protocol includes, 

o Trunk protocol — a bit-oriented protocol similar to SDLC 

® Contention resolution 

o Access control 

o Message closure 

and defines how a message is moved from a TCU buffer interface, down the trunk, to another TCU 
buffer interface, and how a message disposition status is returned. 

2.0 BUFFERS 

The PDC buffer decouples the data rate of the attached device (processor channel, peripheral interface) 
from the 50-megabit data rate of the trunk. The buffer has two parts, a message section and a data 
section. 

Messages have a predefined format and are of fixed length. Data transfers, from the viewpoint of the 
attached unit, can be of any length. The PDC blocks and unblocks long data fields for transmission on 
the trunk. The data buffer functions as a circular buffer. IVhen half full, the PDC begins transmitting 
the data block on the trunk, while at the same time the device- continues outputting data to the PDC 
(other half of the data buffer). The receiving PDC performs a similar fimction, placing the received data 
blocks into its data buffer in a circular fashiop. Thus the sending PDC waits for its attached device 
to deliver a block (half buffer) of data, “bursts” this data on the trunk, and waits for the next block. 
The trunk, is available to other communicating PDC’s between bursts. 

Messages, as they arrive off the trunk, are placed into the message buffer one after the other until 
either the message buffer becomes full, or a message arrives with associated data. If the message buffer 
is full, the TCU returns a BUSY response to the originating TCU. If the message is accepted by this 
PDC, an ACK is returned to the originating PDC. The PDC will accept one message with associated 
data at a time. No additional messages will be accepted until the process defined by the data message 
and its data has been concluded. 


3.0 DEVICE INTERFACE 

This is the logic necessary to match the device control and data characteristics to the PDC. A channel 
interface includes data assembly/disassembly, resync, ready/resume, and whatever control line and function 
translation is required. This logic also includes voltage/impedance (V/Z) matching. The interface to a 
peripheral controller is essentially the same as a channel interface. 
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The major difference between a PDC interfacing with a processor chaimel and one interfacing with a 
peripheral controller is the question of control. A PDC is passive to a channel, and active to a con- 
troller. Thus a channel PDC reacts to the commands of the channel; the state of. this PDC is controlled 
by the channel. The situation is reversed for the peripheral controller PDC. Here the state of the 
peripheral controller is controlled by the PDC; the PDC, in effect, appears to the peripheral controller 
to be a channel. 


4.0 PROCESSOR 

This is the ■ programmable element in the programmable device controller. The software executed by the 
processor includes a basic set, which is found in all PDC’s, and an application-oriented set, i.e., the 
channel PDC, peripheral controller PDC (and its variations). 

5.0 BASIC SOFTWARE SET 
o Message transmission control 

1. Interprets response to message 

a. ACK — Message received. Transmission is completed. Set status for channel. 

b. BUSY — Message was not accepted. Transmission is completed. Set status for 
channel. 

c. No Answer — Contention, transmission error, access violation, or failed PDC 
(also none.xistent). Perform a retry operation. 

Results (a, b, or c) determine next action. 

o 'iSuffer management 

1. Message queue — FIFO. Manage queue pointers. Set up address registers for both the 
channel and the TCU interfaces. 

2. Data buffer management. Set up address and length registers. • 

® Data blocking 

1. Synchronize the transmission of data blocks on the trunk with data motion on the 
channel. 

® PDC state control 

1. Busy/available (for messages), transmit/receive data, connected/discoimected, autoload. 

6.0 CHANNEL PDC SOFTWARE SET 

o Channel Interface 

1. Output message, output data — interpret selects. 

2. Input message, input data — interpret selects. 

3. END OF OP — synchronous with final data message. 
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4. STATUS 

5. ABORT - send message to destination PDC similar to the end of operation message. 

6. MASTER CLEAR 

a. Hardware — reset pointers. 

b. Software (function) — Abort data input/output if in progress. 

7.0 PERIPHERAL CONTROLLER PDC SOFTWARE SET 
Channel extension 

1. Convert command words to channel functions. 

2. Input/output data. 

3. Data blocldng/deblocking. 

Device driver 

1. Manage queue of read/write requests. 

2. Convert requests to the appropriate set of channel command words. 

3. Error retry/recovery. 

4. All of channel extensions. 

Higher level function 

1. File system 

- a. Message translation 

b. ‘ Resource management 

c. Catalogue 

d. Access control 

e. All of channel extensions, all of device driver 
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SERIAL TRUNK CONTROL PROCEDURE 


The Programmable Device Controller Communication-Control Procedare (PDCCP) is a bit-oriented, code- 
independent, modular data link (trunk) control protocol. It is designed for a conference multipoint inter- 
connect, and is meant to be line and link compatible, to the greatest extent practical, with Control Data 
Communication Control Procedure (CDCCP). CDCCP is presently under development as a Control Data 
Corporate Standard and is not yet available for publication. 

CDCCP, which is intended for use in “datacomm” networks, i.e., those using relatively low-speed common 
carrier facilities, expressly forbids .multipoint interconnects, whereas the serial trunk system allows them. 
This constitutes the principal difference in the design philosophies of the two protocols, and arises be- 
cause of the widely divergent application requirements. 

This document defines in detail the frame structure used in all PDCCP transmissions. It describes the 
structure, formatting, and significance of the various fields in the frame as well as frame delimiting flags 
and frame check sequences. 


T.O FRAME STRUCTURE 
1.1 GENERAL 

The vehicle for all command, response, and information transmission is called a frame. A frame is a 
sequence of contiguous bits bounded by and including opening and closing flag sequences. There are two 
types of valid frames, as discussed below. 


1.1.1 TYPE-I Frame 

A valid TYPE-I frame is a minimum of 64 bits in length, including flags, and must conform to the fol- 
lowing structure: 

F, T, FUN, S, P, I, FCS, F 


where 


F 

T 

FUN 

S 

P 

I 

FCS 


= Flag Sequence 
= Destination Address Field 
= Function Field 
= Source Address Field 
= Parameter Field 
= Information Field (optional) 
= Frame Check Sequence 


Frames containing only link control sequences form a special case where no I field is present. 
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The TYPE-I frame structure is illustrated in Figure D-1. Each element of the frame is detailed under 
Section 1.2. 


1.1.2 TYPE-n Frame 

A valid TYPE-II frame is a minunmn of 80 bits in length, including flags, and must conform to the fol- 
^o^ving structure. 

F, T, FUN, AC, S, P, I, FCS, F 

where 

AC = Access Code Field 

and all other elements are identical to the elements of the TYPE-I frame. 

Frames containing only link control sequences form a special case where no I field is present. 

The TYPE-H frame structure is illustrated in Figure D-2. Each element of the frame is detailed imder 
Section 1.2. 


1.1.3 Frame Type limitations 

Command frames may be either TYPE-I or TYPE-Il frames. Response frames are always TYPE-I frames. 
On a ^ven trunk, all command frames must be of the same type. 

1.2 FRAME ELEMENTS 
1.2.1 Flag Sequence (F) 

AU frames open and close with the flag sequence. This sequence has the binary configumtion 01111110, 
that is, a zero-bit followed by six one-bits, followed by a zero-bit. 

The opening flag serves as a position reference for the address and control fields, and initiates trans- 
mission error checking. The closing flag serves as a position reference for the frame check sequence. 

Transmitters must send only complete 8-bit flags. All receivers attached to the data link must search 
continuously, on a bit-by-bit basis, for the flag sequence. Thus, the flag sequence provides frame 
synchronization. 

An F may be followed by a frame, another F, or an idle line. An F which closes a frame may also be 
used as. the opening F on a following frame. Any' number of F’s may be transmitted between frames. 

Since the .F sequence brackets and synchronizes the frame, it must be prevented from occurring in any 
field of the frame. This is accomplished by the zero-insertion technique described below. 

Each transmitter must insert a zero-bit following five contiguous one-bits anywhere between the opening 
and closing flag sequences. The insertion of the zero-bit thus applies to the address, control, information, 
and FCS fields and effectively prevents the fortuitous tomsmission of the F sequence 01111110. 
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F 

T 

FUN 

s 

P 

I 
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F 

Opening 
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Frame ■ Check 
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ation 

Address 

Field 
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Figure D-1. TYPE-I Frame Structiue 



Figure D-2. TYPE-II Frame Structure 


Each receiver after detecting the opesiing flag (start of frame) continuously monitors the received bit 
stream and removes any zero-bit which follows a succession of five contiguous one-bits. Note that zero 
insertion at the transmitter follows the computation of ECS and that zero-deletion at the receiver pre- 
cedes the PCS check process. 

Receivers must be capable of recognizing the following sequences as containing one or more flags. 

a. PCS OllllHO T FUN 

-<-Flag--^ 

The flag can be detected as a valid closing flag for one frame and a valid opening flag 
for the next frame whether or not the first frame was addressed to this receiver. This 
is a combined opening and closing flag. 

b. Fcs oiniiioimiio t fun 

-<-Flag-^ 

-•—Flag— 

Although transmitters must send only complete 8-bit flags, receivers will detect this sequence 
as 2 flags. 

c. omnio XX xx oiiinio 

where X is any combination of bits not comprising a flag. The number of X bits can range 
from 0 upward. 

1.2.2 Destination Address Field (T) 

The Destination Address Field (T) immediately follows the opening flag of a frame and precedes the func- 
tion field. 

Two addressing modes are defined for this addressmg field by the state of the most significant bit (bit- 
zero); 

1.2.2.1 Unique Destination Address 
Bit zero = 0. The remaining 7 bits uniquely identify the destination. 

1. 2.2.2 Global Destination Address 
Bit zero = 1. The frame is directed to all units on the trunk. 

1.2.3 Function Fjeld (F.UN) 

Tlie Function Field (FUN) is located immediately following the destination address field and preceding 
the source address field (TYPE-1) or the access code field (TYPE-II). The function field is used to 
convey the commands and responses necessary to control the data link. 
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Hie two most significant bits of the function field are reserved for link control as follows: 
0 1 2 7 


' ^ = 1 For Rotate Priority Command frame 

= 0 For all other Command frames 

= 0 For Command frame 

= 1 For Response frame 

Note that bit 1 is reserved for command frames, but not for response frames. 

The remaining six (6) bits of the function field are available for specified commands and responses. 

1.2.4 Access Code Field (AC) 

The Access Code Field (AC) immediately follows the function field in the TYPE-II frame; this field 
does not exist for the TYPE-I frame. 

The AC is a “key” which must match the “lock” on the receiving unit in order that the frame be 
accepted. If the match is not made, the frame is discarded. 


1.2.5 Source Address Field (S) 

The Source Address Field (S) immediately follows the function field in the TYPE-I frame or the access 
code field in the TYPE-II frame. 

The Source Address Field identifies the unit which sent the frame. 

1.2.6 Parameter Field (P) 

The Parameter Field (P) immediately follows the Source Address field, preceding the information field. 

The parameter field provides control or status for the control message or response message respectively. 

1.2.7 Information Field (I) 

The data link control is completely transparent to the contents of the I field. The I field may, there- 
fore, consist of any number of bits, in any code, related to character structure or not and limited only 
by system requirements. The I field is unrestricted as to length but it should be recognized that typical 
length is contingent on system requirements and limitations beyond the link level. Factors limiting I 
field length may include channel error characteristics, PDC buffer size, and the logical properties of the 
data. 

The fortuitous occurrence of a flag or abort sequence within the I field is prevented by the zero-insertion 
techirique ’described in paragraph 1.2.1. 

An I field with a length of zero is specifically permitted. 
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1,2.8 Frame Check Sequence (FCS) 


Each frame includes a 16-bit frame check sequence (FCS) immediately following the I field (or the P 
field if there is no I field) and preceding the closing flag. The FCS field serves to detect errors induced 
by the transmission link thus validating transmission accuracy. The 16-bit FCS results from a mathemati- 
cal computation on the digital value of all bits (excluding inserted zeros) in the frame including the 
destination address, firaction access code, source address, parameter and information fields. 

The process is known as cyclic redundancy checking using the CCITT Recommendation V.41 generator 
polynomial of + X^ -f- 1. The transmitter’s 16-bit remainder value is initialized to all ones 

before a frame is transmitted. The binary value of the transmission is premultiplied by X^^ and then 
divided by the generator polynomial. Integer quotient values are ignored and the transmitter sends the 
complement of the resulting remainder value, high-order bit first, as the FCS field. 

At the receiver the initial remainder is preset to all ones and the same process is applied to the serial 
incoming bits. In the absence of transmission enors the final remainder is 1111000010111000 
(X 0 thru Xj 5 respectively). 

The receiver will discard a frame in error. Subsequent retransmission of the errored block is under 
control of error recovery procedures. 

1.3 ADDITIONAL CONVENTIONS 
1.3.1 Interframe Time Fill 

Interframe 'time fill may be transmitted to maintain the link in an active state. Time fill may also be 
used to avoid timeouts and to hold the authority to transmit. 

IVhen used, interframe time fill must be a series of contiguous flags which are contiguous to the closing 
flag of one frame and the opening flag of the next frame. 

1.3.2 Abort 

Abort is the process by which a PDC, in the act of transmitting a frame, decides before the end of that 
frame to terminate in an unusual maimer which will cause the receiver to discard the frame. 

Aborting a frame is accomplished by transmitting at least seven consecutive one-bits with no zero- 
insertion. Receipt of seven contiguous one-bits is interpreted as an abort. 

1.3.3 Invalid Frame 

An invalid frame is defined as one not properly bounded by an opening and closing flag or one which 
is too short, e.g., less than 64 (TYPE-I) or 80 (TYPE-II) bits between flags. An aborted frame is an 
invalid frame. A PDC will ignore an invalid frame. 

1.3.4 Order of Bit Transmission 

The order of transmission for all fields is most significant bit first. 
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2.0 DEVIATIONS IN PDCCP FROM CDCCP 


2.1 FRAME TYPES 

CDCCP defines one frame type; PDCCP defines two (one incorporates an access code, one does not). 


2.2 FIELD DEFINITIONS 


PDCCP defines three additional fields: 


S — Source Address Field 
P — Parameter Field 
AC — Access Code Field 


2.2.1 Address Field Definition 


The Address Field (A) of CDCCP is redefined as the 

A (CDCCP) 

N Octets in length 
where N >= 1 

Two Addressing Modes 
Group Addressing 
Global Address = 11111111 

NuU Address = 00000000 
(ignored by all stations) 

Address Field refers to 
Source or Destination 


Destination Address Field (T) in PDCCP as follows: 

T (PDCCP) 

Single Octet 

Single Addressing Mode 
No Group Addressing 
Global Address = IXXXXXXX 
No NuU Address 

Address Always Destination 


2.2.2 Control — C (Function — FUN) Field 

C (CDCCP) FUN (PDCCP) 

N Octets in length Single Octet 

where N ^ 1 


-• Content Definitions TotaUy Different 

2.3 MINIMUM LENGTH OF FRAME 

CDCCP PDCCP 

48 Bits 64 Bits (TYPE-I) 

80 Bits (TYPE-H) 
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2.4 ORDER OF BIT TRANSMISSIONS 


CDCCP PDCCP 

Address, Control, Respon- Ail Fields 

ses, and Sequence Numbers Most Significant Bit First 

are Low Order Bit First 

(Bit 2O first) 

Data (I Field) Any Order 
FCS = Most Significant Bit First 

3.0 TRANSACTIONS 
3.1 GENERAL 

A transaction is a dialogue between two units on a trunk. At least one of the units must be active 
(a PDC). Active units can transmit command or response frames. Passive units can transmit only response 
frames. The multiplexed loop controller is an example of a passive unit. 

A dialogue can be viewed as a set of command and response transmissions. Consider two units, A and B: 
IDLE TRUNK 

A transmits command frame(s) to B 
Reserved trunk 

B transmits response frame(s) to A 
.Reserved tnmk 

A transmits command frame(s) to B 
0 
o 
e 
e 

B transmits response frame(s) to A 
IDLE' TRUNK 

A dialogue, therefore, consists of one or more sets of command/response transmissions. 

Each PDC contains one to four trunk interfaces. These interfaces are called Trunk Control Units (TCU). 
The TCU interfaces the trunk and the PDC buffer. Tiie TCU selects frames from the trunk and presents 
frames to the trunk. 

The' TCU analyzes certain frame fields as part of the select process. Likewise the TCU generates certain 
fields when presenting frames to the trunk. These TCU operations, as they pertain to the PDCCP frame, 
are discussed in detail in the following sections. 
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3.2 TRANSMIT COMMAND FRAME ~ TYPE-I OR TYPE-H 



I F [ T j FUN j I FCS j F I 

F = Flags 

FCS = Frame Check Sequence 

FUN = Bit 0 and 1 forced to zeros. Bits 2-7 come from the buffer. 



= not generated by the TCU 


3.3 RECEIVE COMMAND FRAME - TYPE-I 
F T FUN j S j j FCS F 

j-^not analyzed by the TCU^-j 

T = The transmitted address which must match this unit’s address. 


FUN = The function field wliich must identify this frame as a command frame (bit 0 = 0). The 
TCU recognizes a small set of functions. 

S = The source field of the first correctly received frame; it is saved by the TCU 


3.4 RECEIVE COMMAND FRAME - TYPE-II 
F I T_| FUN I AC I S I ' , I FCS I F 

|-‘-not analyzed by the TCU-*-| 

AC = The transmitted access code which must match the physical access code of this unit. 

AH other fields are recognized identically to the TYPE-I frame of Section 3,3. 

3.5 TRANSMIT RESPONSE FRAME 

F j T j FUN S P FCS F 

l^not generated by the TCU-^ 

T = The value saved' when receiving a command frame. See Section 3.3. 

FUN = The response, including bit 0 = 1. 

S = The physical address of this unit. 

P = The parameter field; contains unit status.. 

T, FUN, S, and P are inserted by the TCU. 
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3.6 RECErVE RESPONSE FRAME 


F 

T 

FUN 


FCS 

F 




^not analyzed by the TCU — ^ 




The transmitted address which must match this unit’s address. 


FUN = Bit 0 must be set. 
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